Issue with removing space from char from loaded data

Hello,
I am helping a colleage with some data analysis and come accross an issue I cannot find a way of fixing;
After loading in data (using textscan and loading as %s, for some reason the format of the data doesnt allow for %f, %d etc) and extracting data from the cells, it ends up in a character such as the one shown:
1 . 3 3 3
How does one remove the white space from this character in order to then properly turn it into numeric data that can then be used? I have tried things such as ~isspace yet this doesnt work; the result of isspace on the above number is
0 1 0 0 0 0 0 0 0 0 0 0 0
Which isnt useful!
If anyone knows a way around this I would be greatfully appriciative! Or indeed ideas on why %d, %f etc doesnt work in the textscan, as that might also solve the problem...
Many thanks David

4 Comments

@David: please edit your question and upload a sample file by clicking the paperclip button.
Hi,
Ive attached the first few lines of one of the files. Note in practice its much longer than this (~500 lines) yet that shouldnt be an issue scaling up a solution!
Why are you using %s for loading numeric data? Why not simply load numeric data using a numeric format and get a numeric variable?
For some reason it doesnt work with d or f for example (gives blank/empty arrays). Im still trying to figure out why, yet if you have ideas as to why this could be the case Id be appriciative!

Sign in to comment.

 Accepted Answer

I don't have any problems importing your sample file (attached) as numeric data. Here are the two ways I tried:
Method one: dlmread:
>> M = dlmread('example_data.txt','\t',1,0)
M =
1.333 16594.953
2.667 16562.166
4 15972.454
5.333 15968.083
6.667 15482.982
8 14435.071
>>
Method two: textscan:
opt = {'HeaderLines',1, 'CollectOutput',true};
[fid,msg] = fopen('example_data.txt','rt');
assert(fid>=3,msg)
C = textscan(fid,'%f%f',opt{:});
fclose(fid);
which gives this data:
>> C{1}
ans =
1.333 16594.953
2.667 16562.166
4 15972.454
5.333 15968.083
6.667 15482.982
8 14435.071

5 Comments

Hi,
For some reason this yeilds an "empty" (i.e. just shows []) 1x1 cell (C)
I have uploaded the full file in the original post to show that I am getting the error using this.
When using the example file your method does work fine, which is odd
I have done so now in the original post; I made the (in my eyes fair) assumption that the bulk of the data would behave the same as the first secotion of it as the format doesnt change.
@David: the problem is the file encoding: your original data file is encoded as UCS2 Little Endian (similar to UTF-16), whereas the sample file is saved as a simple ANSI encoding. If you commonly write some non-English language or your PC Locale setting is not English then UCS2 may be the default file encoding.
There is no point in saving such a simple numeric data file as UCS2, so I would recommend that you save the file instead as ANSI: one simple way would be to open it using Notepad++, then change the encoding, and finally save it with the correct encoding. Alternatively you could fopen it as UCS2, but I have no experience with this.
The UCS2 file encoding also explains the "space" characters that you see when importing the data as character: the second byte of each character is being interpreted as a new character inside MATLAB, but most likely is out of ASCII range and is becomes a null character or a control character. In any case, this is a good example of the X-Y problem: rather than fixing mysterious spaces that you don't understand where they come from, you really should just fix the file encoding.
Hi Stephen,
Indeed you are right and that did fix it! Many thanks.
I will now have some friendly words with my colleage on how he should save his data in the future...
Thanks again!
@David: I hope that it helped. You can also accept my answer, if these comments helped you.

Sign in to comment.

More Answers (1)

if you're only importing numeric data from those files, why not just use dlmread or csvread or textread. I'd personally prefer readtable.
It'S better to import the data clearly than having to deal with the problem of improper imports.
check these links:

4 Comments

Hi,
The issue is the formating of the raw data, it looks like so (Ive now uploaded a sample file of the data on the original post);
X Y
1.333 16594.953
2.667 16562.166
4 15972.454
As you can see the size of columns keeps changing and there is a headerline to ignore (I didnt take the data Im just helping to salvage it...); hence using textscan as it can handle the column sizes changes and remove the headerline. As mentioned for some reason a numeric data type like %d or %f didnt work on the load for some reason, and things like dlmread find issues with this, at least to my knowledge.
What do you mean by the size of columns keeps changing? I see perfectly tab spaced data. About ignoring the headerline, if you read any of the links I gave you, you would have seen that you could simply ignore the first line (if you want to). For example,
filename = 'somename.txt';
delimter = ' ';
rowstoIgnore = 1;
columnstoIgnore = 0;
data = dlmread(filename,delimiter, rowstoIgnore, columnstoIgnore);
on the other hand, if you use readtable, it understands X and Y are variable names and stores them as well. For example,
data = readtable(filename);
now you can access the data like,
data.X %or data.Y
Hi,
Using dlmread I get this error;
Mismatch between file and format string. Trouble reading 'Numeric' field from file (row number 1, field number 1) ==> \n
And readtable yeilds:
Error using readtable (line 143) Cannot interpret data in the file 'PlotValues0002.txt'. Found 2 variable names but 1 data columns. You may need to specify a different format string, delimiter, or number of header lines.
hence me refering to the columns chaning size (ie decimal points)
"...and things like dlmread find issues with this, at least to my knowledge"
This does not mean you should jump straight to creating some complex work-around using strings. You could have asked about how to import the numeric data first. Only if that proved really difficult should you start to investigate other methods.

Sign in to comment.

Categories

Asked:

on 3 Nov 2017

Commented:

on 3 Nov 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!