Unstack aggegate function and preset fill value do not match
Show older comments
Hello,
I'm trying to add large amounts of files spoanning different periods, stations and variables. I read in the files (some are column based, others row-based) and then add then by creating tables, stacking them, adding the stacked tables and unstack them again and creating a structure again. I do it this way to preserve a unique column (here time but this may be another variable). My test code is
% create structures with different dates
StructAdd.fdate=datenum('1984-01-01');
StructAdd.PO4=0.75;
StructIn.fdate=[datenum('1985-01-01'); datenum('1986-01-01')];
StructIn.NH4=[0.08; 0.055];
%-----------------
TableIn= struct2table(StructIn);
TableAdd= struct2table(StructAdd);
colvec=1:width(TableIn); colvec(1)=[]; % remove time column
StackIn = stack(TableIn,colvec,'IndexVariableName','detcode','NewDataVariableName','value');
colvec=1:width(TableAdd); colvec(1)=[]; % remove time column
StackAdd = stack(TableAdd,colvec,'IndexVariableName','detcode','NewDataVariableName','value');
StackTotal=vertcat(StackIn,StackAdd);
TableOut=unstack(StackTotal,'value',pivotfield,'AggregationFunction',@mean);
StructOut=table2struct(TableOut,'ToScalar',true);
%-----------------
disp('==============')
disp('PO4 data after adding first PO4 file with 1984-01-01: 0.75 data')
for i=1:length(StructOut.PO4)
disp([datestr(StructOut.fdate(i)) ' PO4: ' num2str(StructOut.PO4(i))])
end
disp('==============')
% Now add new data for dates already added for another variable
% --> if data is already present for this date the mean should be taken
StructAdd.fdate=datenum('1985-01-01');
StructAdd.PO4=0.2;
%-----------------
TableIn= struct2table(StructOut);
TableAdd= struct2table(StructAdd);
colvec=1:width(TableIn); colvec(1)=[]; % remove time column
StackIn = stack(TableIn,colvec,'IndexVariableName','detcode','NewDataVariableName','value');
colvec=1:width(TableAdd); colvec(1)=[]; % remove time column
StackAdd = stack(TableAdd,colvec,'IndexVariableName','detcode','NewDataVariableName','value');
StackTotal=vertcat(StackIn,StackAdd);
TableOut=unstack(StackTotal,'value',pivotfield,'AggregationFunction',@mean);
StructOut=table2struct(TableOut,'ToScalar',true);
%-----------------
disp('==============')
disp('PO4 data after adding second file with 1985-01-01: 0.2 data')
for i=1:length(StructOut.PO4)
disp([datestr(StructOut.fdate(i)) ' PO4: ' num2str(StructOut.PO4(i))])
end
disp('==============')
If I use @mean as the aggregate function in unstack I get
==============
PO4 data after adding first PO4 file with 1984-01-01: 0.75 data
01-Jan-1985 PO4: NaN
01-Jan-1986 PO4: NaN
01-Jan-1984 PO4: 0.75
==============
==============
PO4 data after adding second file with 1985-01-01: 0.2 data
01-Jan-1985 PO4: NaN
01-Jan-1986 PO4: NaN
01-Jan-1984 PO4: 0.75
==============
If I use @nanmean as the aggregate function in unstack I get
==============
PO4 data after adding first PO4 file with 1984-01-01: 0.75 data
01-Jan-1985 PO4: 0
01-Jan-1986 PO4: 0
01-Jan-1984 PO4: 0.75
==============
==============
PO4 data after adding second file with 1985-01-01: 0.2 data
01-Jan-1985 PO4: 0.1
01-Jan-1986 PO4: 0
01-Jan-1984 PO4: 0.75
==============
But neither of these is the answer I am looking for, which should give
==============
PO4 data after adding first PO4 file with 1984-01-01: 0.75 data
01-Jan-1985 PO4: NaN
01-Jan-1986 PO4: NaN
01-Jan-1984 PO4: 0.75
==============
==============
PO4 data after adding second file with 1985-01-01: 0.2 data
01-Jan-1985 PO4: 0.2
01-Jan-1986 PO4: NaN
01-Jan-1984 PO4: 0.75
==============
The Matlab guidance for unstack states that the aggregate function chosen sets the fillvalue: I don't see any options to set this separately. How can I have NaN fillvalues and yet use nanmean as the aggregate function??
Accepted Answer
More Answers (1)
1. We highly recommend that you use datetime instead of datenum. Please let me know if you can't use datetime for some reason as we'd like to understand why.
e.g.
datetime(["1985-01-01"; "1986-01-01"], "Format", "uuuu-MM-dd")
2. When your table data has time as a variable, it is often more more convenient to use a timetable instead.
e.g.
StructIn.fdate = datetime(["1985-01-01"; "1986-01-01"], "Format", "uuuu-MM-dd");
StructIn.NH4 = [0.08; 0.055];
TableIn = struct2table(StructIn);
TableIn = table2timetable(TableIn)
3. This workflow of calling stack, unstack, and table2struct is unnecessary. Use outerjoin instead.
StructAdd.fdate = datetime("1984-01-01", "Format", "uuuu-MM-dd");
StructAdd.PO4 = 0.75;
StructIn.fdate = datetime(["1985-01-01"; "1986-01-01"], "Format", "uuuu-MM-dd");
StructIn.NH4 = [0.08; 0.055];
TableIn = struct2table(StructIn);
TableIn = table2timetable(TableIn)
TableAdd1 = struct2table(StructAdd);
TableAdd1 = table2timetable(TableAdd1)
TableOut = outerjoin(TableIn, TableAdd1)
StructAdd.fdate = datetime("1985-01-01", "Format", "uuuu-MM-dd");
StructAdd.PO4=0.2;
TableAdd2 = struct2table(StructAdd);
TableAdd2 = table2timetable(TableAdd2)
TableOut = outerjoin(TableOut, TableAdd2)
Alternatively, you could use synchronize, since you're combining your data by time.
TableOut = synchronize(TableIn, TableAdd1)
TableOut = synchronize(TableOut, TableAdd2)
1 Comment
Sonja van Leeuwen
on 17 Nov 2021
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!