Splitting time series sediment data for each station (site_no.)

I have a very big one time series .tsv data (~ 2000 site_number). The time series has very large row no. (~20000000 rows) and 9 columns. I want to split the time series and save it for each site no. I'm trying to use text scan and fgetl function in matlab but couldn't effective. Any help please. The sample of the time series:

Answers (2)

You may follow the following steps:
1. Load the file using textscan. Here you can give header lines option. Miss the header lines and load only data.
2. Extract the column of station id's.
3. Use ismember. This will help you to separate the station id's.
So overall, you need to read about textscan and ismember.

5 Comments

I was trying to use textscan to load the file using header lines option by skipping the first 17 header lines. But the file also has 'agency_cd site_no datetime DAILY_FLOW DAILY_FLOW_QUAL DAILY_SSC DAILY_SSC_QUAL DAILY_SSL DAILY_SSL_QUAL' header line in between stations through the file. How can I skip this header line?
Is there any means to attach here the file here with size > 5MB? My file size is 940MB. Thanks for the help
fid = fopen('daily_data.tsv') ;
% S = textscan(fid,'%s','Delimiter','\n','headerlines',17) ;
str1 = repmat('%*s',1,3) ;
str2 = repmat('%f',1,90) ;
str = ['%s','%f','%s','%f','%s','%f%[^\n]'] ;
S = textscan(fid,str,'HeaderLines',17);
S(end) = [] ;
fclose(fid) ;
%%Site number
SN = S{2} ;
%%time series
time = S{3} ;
%%seperate the indices for number
SN1 = unique(SN) ;
NSN = length(SN1) ; % total number of stations
iwant = cell(NSN,1) ;
for i = 1:NSN
idx = ismember(SN,SN1(i)) ;
iwant{i} = time(idx) ;
end

Sign in to comment.

thanks for the help. But it reads the first four stations only. After the fourth station, there is one line header again. so how to skip the header comes after the fourth station?

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!