Combining text files with dates and missing data

2 views (last 30 days)
I have numerous files in this format
YEAR MONTH DAY HOUR MIN VAL
1999 1 1 0 0 123
where the file contains data for a single year, month runs 1-12, day from 1-30 (or 28,29, or 31), hour goes from 0-23 and min almost always 0 and val represents a different type of value depending on the file (i.e. one file will have speed, another the mass, another the temperature of the object, etc.).
I want to combine the text files so that they end up like this
YEAR MONTH DAY HOUR MIN VAL1 VAL2 VAL3
1999 1 1 0 0 123 321 123
up to 7 values.
My problem is when data is missing/not recorded in one of the text files (e.g. Jan 1st 1999 Hours 0-23 has data but the next row is Jan 3rd 1999 Hour 0). When the data is missing, I'm not sure what to do as the arrays become uneven and I can't concatenate them unless I somehow fill in the gaps.
Is there a way to fill in any gaps with a blank or some non-data value like 99999 so that I can combine them? Or is there another way to combine the files? The latter is my goal.
  6 Comments
Walter Roberson
Walter Roberson on 12 Aug 2015
To check: for each year there are 7 files? Or is there only 1 file that has 7 types of values in it? If it is 1 file that has 7 types of values in it, how do you tell which value-type a particular line is?
Cas Cas
Cas Cas on 12 Aug 2015
Each year has a bunch of files, but 7 values in particular I'm interested in, so for the purpose here, basically for each year there are 7 files, each with a date and time and value. I'm trying to put them into one file.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 12 Aug 2015
Assuming the data is in DataVal_1, DataVal_2 through DataVal_7, and assuming that either only a single year is referenced or else that the first entry of DataVal_1 refers to the lowest numbered year to be considered:
filler_value = 9999;
first_hour = datenum(DataVal_1(:,1),1,1);
timekeyvec = DataVal_1(:,1:4); %Y,M,D,H
timekeyvec(end,6) = 0; %extend by 0 for min and sec
timekeynum = datenum(timekeyvec); %serial date numbers
houridx_1 = round((timekeynum - first_hour) * 24) + 1;
Do the same thing for the other DataVal_* arrays producing houridx_1 through houridx_7 . Then
lasthouridx = max( [houridx_1; houridx_2; houridx_3; houridx_4; houridx_5; houridx_6; houridx_7] );
mergedvals = filler_value * ones(lasthouridx, 7);
mergedvals(houridx_1,1) = DataVal_1(:,6);
mergedvals(houridx_2,2) = DataVal_2(:,6);
mergedvals(houridx_3,3) = DataVal_3(:,6);
etc.
date_of_entries = datevec(((1:lasthouridx) - 1) / 24 + first_hour);
merge_table = [date_of_entries(:,1:4), zeros(lasthouridx,1), mergedvals];
Now write out merge_table
(Testing further, I find it would be possible to build up the mergedvals a file at a time instead of having to read all 7 files first)
  3 Comments
Cas Cas
Cas Cas on 12 Aug 2015
(((1:lasthouridx) - 1) / 24 + first_hour) has dimensions 2088 x 2160. Running that through datevec puts them in a column vector of (2088*2160) x 6.
Also, it appears that date_of_entries isn't repeating set of values. I checked and each hour entry repeats for a very long time (e.g. it repeats [1965 1 1 0 0 0] for a very long time before changing to [1965 1 2 0 0 0]). It should repeat in a (0:23)' pattern.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!