Import Tab Delimited Data with Missing Values

5 views (last 30 days)
Hello,
I am looking to import data (txt file) that is separated by tab, but has missing values. The file location is here to see what I mean <ftp://ftp.ncdc.noaa.gov/pub/data/paleo/contributions_by_author/zachos2001/zachos2001.txt>
Ideally I'd like to import this just to make simple x-y graphs, replacing the missing values as NaN; unfortunately, they just left the missing values blank and so Matlab throws in data from one column into another. Someone recommended excel but I'd prefer to avoid this since I am working on a Linux and am using the OpenOffice version of "excel" which I don't like using. Help appreciated.
Thanks

Answers (4)

sco1
sco1 on 1 Nov 2011
EDIT: Never mind what I had originally posted, I didn't look closely enough at what MATLAB gave me, I'll see if I can figure out a workaround.
EDIT2: I know you don't like using excel, but you can save the data as a text file, read it into excel, and then read the excel file into Matlab (I used fixed width 'delimiting' and deleted the 'Genus' column). Not the most elegant solution, but it seems to work.

Chris
Chris on 1 Nov 2011
Alright thanks, it's only a problem because I can't use excel on this computer, but I can probably get Matlab access on another mac (which has excel), unless someone knows how to do it more directly.

sco1
sco1 on 1 Nov 2011
Another less elegant solution would be to loop fgetl(), which will return each line of the file
For example: document = fopen(file); testline = fgetl(document);
If you put the fgetl into a loop, it will read through the entire document line by line and output the entire string of characters. For example, line 1 would be '607 0 3.28 0.88' and line 3 would be '607 0.002 3.16 0.96 3.252 0.72'. However, since there is no constant delimiter in the file you linked, you would need to have some sort of logic for figuring out where there are holes in the data, and I don't think it's a trivial task unless I'm using a methodology that's more complex than is necessary.
It seems like using excel to at least format the data would save you a lot of coding time. Then you can use the time you saved go whack the scientists at NASA in the head and tell them to use comma delimited files like normal people.

Walter Roberson
Walter Roberson on 1 Nov 2011
When I fetch that file, including by using ftp directly, I do not get any tab characters. Instead, I see columns of fixed width. In order the widths are
3, 19, 16, 11, 11, 11, 11
Unfortunately, textscan() and fscanf() and sscanf() do not support field widths, so I suggest you proceed in a manner similar to this, where LinesToSkip is the number of header lines:
IN = char(textscan(fid, '%s', 'HeaderLines', LinesToSkip, 'Delimiter', '\n')) ;
close(fid);
N{1} = str2double(cellstr(IN(:,1:3)));
N{2} = str2double(cellstr(IN(:,4:22)));
N{3} = IN(:,23:38);
N{4} = str2double(cellstr(IN(:,39:50)));
N{5} = str2double(cellstr(IN(:,51:61)));
N{6} = str2double(cellstr(IN(:,62:72)));
N{7} = str2double(cellstr(IN(:,73:83)));
clear IN
Going via char() and then using cellstr() all over the place is a mechanism to work around the fact that the occasional line is shorted because one or more trailing fields is not present. The char() forces all the lines to be the same size, allowing you to split the text by column numbers. str2double() can then be used, but it will not work on character arrays, only on cell string arrays, so the cellstr() call.

Categories

Find more on Cell Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!