how to read grid data from text file ?

15 views (last 30 days)
pruth
pruth on 23 Sep 2017
Commented: dpb on 12 Jul 2019
hi I have a text file(attached). which contain ozone data. I am not able to read the data. since it is not in regular format. only latitude(-59.5S to 59.5N (1.00 degree steps) ) is given and on every latitude all ozone data is given so there are 288 longitudes(-179.375W to 179.375E (1.25 degree steps)) therefore 288 data points are there. but the problem is all data is in string format and we need to split data after every 3 digit. some random space is also given in the middle of the data so we have to remove that also otherwise data will not split in 3 correct digits .
later i will use inpolygon to grab out the data from specific region. that i will try later. but first i need to read this text file and took the data out.
hope you understand.
  2 Comments
Cedric
Cedric on 23 Sep 2017
Does this format have a name? Is it the original format in which the data is distributed?
pruth
pruth on 23 Sep 2017
Edited: pruth on 23 Sep 2017
yes.the same original file is attached . the earlier data which I used was very differently arranged and bit simple. I am no good in programing. so finding this hard. hope you will help.

Sign in to comment.

Accepted Answer

Cedric
Cedric on 23 Sep 2017
Edited: Cedric on 23 Sep 2017
The format seems to be GridTOMS as mentioned here. There is an IDL reader and there may be MATLAB ones.
If you need a stable reader, I advise you to look for a MATLAB implementation "endorsed" by NASA. If you need a quick hack to perform early tests, you can try the following (where I assume that spaces code for trailing zeros):
content = fileread( 'L3_tropo_ozone_column_jan14.txt' ) ;
% - Remove first space on all data rows.
content = regexprep( content, '(?<=[\r\n]) ', '' ) ;
% - Split by "lat = ..." separator.
blocks = regexp( content, '\s+lat[^\r\n]+', 'split' ) ;
% - Extract header from block 1.
pos = regexp( blocks{1}, '\)\s+\d', 'start' ) ;
header = blocks{1}(1:pos) ;
blocks{1} = blocks{1}(pos+1:end) ;
% - Merge blocks, remove \r\n, replace spaces by 0s.
blocks = [blocks{:}] ;
blocks = regexprep( blocks, '[\r\n]', '' ) ;
blocks(blocks == ' ') = '0' ;
% - Convert to 120x288 numeric array.
data = reshape( sscanf( blocks, '%3d' ), 288, 120 ).' ;
Note that it is easy to wrap this in a function and call it while iterating through files from a folder (using the output of DIR). It is also easy to extract meta information from the header if relevant.
  9 Comments
pirapts Raptis
pirapts Raptis on 12 Jul 2019
Edited: pirapts Raptis on 12 Jul 2019
Hello everybody,
i am processing some similar files (asc again, from the same dataset, but for other variable link)
the problem is that there 4 digit numbers in the files.
so i changed to sscanf( blocks, '%4d' )
which provides the correct dimernsions for the output (720X1440)
but there are misread numbers .
in the ascci file ;ooks like
559 584 656 84811281610184216791461128412291089 667 574
but matlab format the output as
559 584 656 8481 1281 6101 ...
instead of
559 584 656 848 1128 1610...
i have tried to process them line by line and the same fault appear.
also i have noticed that the blocks char has length 4108186.
i still don't understand how i get correct dimensions (720*1440*4=4147200 for 4digits), and how it stops reading at wrong digit when 4digit numbers appear
any idea on how to handle that would be really usefull
(matlab 2014b)
dpb
dpb on 12 Jul 2019
"so i changed to sscanf( blocks, '%4d' )"
The problem is C -- the formatting was not designed with fixed-width files in mind and it simply can't handle them by default because '%4d' does NOT mean what one logically would expect; namely :"read four-character-width fields beginning at the beginning of the recore". Instead it means "read no more than 4 characters, but C silently "eats" the white space and so, as you notice, by the time it gets to the fourth entry in your input record, it begins with the 8 instead of the blank and reads "no more than" four characters. But, that's not the right answer. Fortran FORMAT gets it right, but unfortunately Mathworks chose the easy way out when rewrote MATLAB in C and used the C runtime i/o library instead of building a FORMAT facility. Late releases have (finally!! after 30 years) introduced a new fixed width text import object but that won't help you unless you can upgrade.
You simply have to count characters (including blanks) and process the resulting substrings -- with the sample record you give (NB: you're missing the leading blank at the beginning of the record)
>> str2num(reshape(rec,4,[]).')
ans =
559
584
656
848
1128
1610
1842
1679
1461
1284
1229
1089
667
574
>>

Sign in to comment.

More Answers (2)

dpb
dpb on 23 Sep 2017
Edited: dpb on 23 Sep 2017
  1. Read the file as block of cellstr, convert to character array
  2. Convert char array of 12x75 to 1*900 line=reshape(blk.',1,[]);
  3. Select first 288*3 --> 864 characters c=line(1:864);
  4. Replace any blanks with '0' c=strrep(c,' ','0');
  5. Convert 3-digit fields dat=sscanf(c,'%3d');
  6. Go next block
Thanks to Cedric for pointing out my weak eyes... :)
file=textread('tropo.txt','%s','delimiter', '\n','whitespace', '','headerlines',3); % file as cellstr array
L=length(file); % number lines/records in file
data=zeros(L/12,288); % preallocate for resulting data
j=0; % counter for data blocks
for i=1:12:L % loop over blocks of 12 records
blk=char(file(i:i+11)); % retrieve a block, convert to character array
blk(:,1)=''; % remove leading blanks
line=reshape(blk.',1,[]); line=line(1:864); % recast as record;truncate
line=strrep(line,' ','0'); % replace blanks with leading 0
j=j+1; % increment counter
data(j,:)=sscanf(line,'%3d'); % convert to numeric
end
results in a double array containing the data...
From the first block I tested at command line--
>> whos data
Name Size Bytes Class Attributes
data 288x1 2304 double
>>
  3 Comments
dpb
dpb on 23 Sep 2017
Old eyes failed me...I had mistakenly thought char() had gotten rid of the leading space but didn't...thanks.
Cedric
Cedric on 23 Sep 2017
My maybe younger eyes failed me too. I had to get tricked a couple times before I realized!

Sign in to comment.


Guillaume
Guillaume on 23 Sep 2017
Whoever created that format should be very ashamed. It's a pain to parse.
This is a start. I still need to figure out why I've got 292 columns instead of 288, but I've got to go.
filecontent = fileread('L3_tropo_ozone_column_jan14.txt'); %read it all
filecontent(ismember(filecontent, [10, 13])) = []; %remove line returns
longdesc = regexp(filecontent, 'Longitudes:\s*(\d+)\D+(\d+(\.\d+)?)([EW])\D+(\d+(\.\d+)?)([EW])', 'tokens', 'once'); %longitude description
longnumbers = str2double(longdesc([1 2 4]));
longnumbers(2:3) = longnumbers(2:3) .* (-1).^ strcmp(longdesc([3 5]), 'W'); %change sign for W
longitudes = linspace(longnumbers(2), longnumbers(3), longnumbers(1));
pointlats = regexp(filecontent, '\s+([0-9 ]+)lat\s*=\s*(-?\d+(\.\d+)?)', 'tokens'); %extract point strings and latitude
pointlats = vertcat(pointlats{:});
latitudes = str2double(pointlats(:, 2));
points = regexprep(pointlats(:, 1), '\s', '0'); %replace spaces with 0
points = regexp(points, '\d{3}', 'match'); %split in group of three
points = str2double(vertcat(points{:}));
  5 Comments
Cedric
Cedric on 23 Sep 2017
The format is consistent (see my comment under you answer). What is annoying is that it is designed partly because of "machine" constraints, and partly for looking "cute" to a human eye when opened in a text editor.
dpb
dpb on 23 Sep 2017
Wonder why put the leading blank in there, though...that really is the only really bad part; the rest is pretty easy to deal with but that makes for special-casing. Oh, the no leading zero in the format is also pretty ugly; almost forgot that! :)

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!