Problem with splitapply (bin averaging)

I'm having issues with running splitapply
Error using splitapply (line 111)
For N groups, every integer between 1 and N must occur at least once in the vector of group numbers.
Error in Binning_Code_cont (line 198)
binMean_VWC_09 = splitapply(@(vec_VWC_09)mean(vec_VWC_09,'omitnan'),vec_NEP_VWC_09,hbin_VWC_09);
The weird thing is that I use the same code for another variable and it gives me the desired outcome (see attached plot), By looking at this plot you would understand what I'm trying to do here. In the plot below, y-axis values are averaged for every 0.5 interval on the x-axis
% Removing nighttime values (PAR = 0) and VWC values of less than equal to zero
T_VWC_09 = table(VWC_09,NEP_09,PAR_09,'VariableNames', {'VWC','NEP','PAR'});
T_VWC_09_final = rmmissing(T_VWC_09);
T_VWC_09_final(T_VWC_09_final.PAR==0,:)=[];
T_VWC_09_final(T_VWC_09_final.VWC <= 0,:)=[];
vec_VWC_09 = T_VWC_09_final.VWC; % pulling out VWC (x-axis)
vec_NEP_VWC_09 = T_VWC_09_final.NEP; % pulling out NEP (y-axis)
% Calculating measurements of binned VWC (bin size 0.05) against values of NEP
binWidth2 = 0.05;
edges_VWC_09 = min(0) : binWidth2 : max(vec_VWC_09);
[~, ~, hbin_VWC_09] = histcounts(vec_VWC_09,[edges_VWC_09,inf]);
binMean_VWC_09 = splitapply(@(vec_VWC_09)mean(vec_VWC_09,'omitnan'),vec_NEP_VWC_09,hbin_VWC_09);
I have also attached my two variables in csv format for your reference. Thank you in advance!

 Accepted Answer

The grouping variable in splitapply is expected to be a vector of positive integers 1 to n and cannot contain a missing group value. Here's an example that does not contain a group for 3:
g = [1 1 2 2 4 4];
data = 1:6;
splitapply(@mean,data,g)
It appears that your data is not present in some of the bins computed by histcounts.
A workaround is to use findgroups to create your grouping variable.
[g, gid] = findgroups(hbin_VWC_09)
out = splitapply(@(vec_VWC_09)mean(vec_VWC_09,'omitnan'), vec_NEP_VWC_09, g)
Although out(i) may no longer correspond with bin i if some bins are emtpy. You can address that using gid which defines the bin for each group value.
Another alternative is to use arrayfun instead of splitapply.

6 Comments

Thank you for your answer @Adam Danz. Is this how my code is supposed to look like? The values at the end don't make sense.
binWidth2 = 0.05;
edges_VWC_09 = min(0) : binWidth2 : max(vec_VWC_09);
[~, ~, hbin_VWC_09] = histcounts(vec_VWC_09,[edges_VWC_09,inf]);
[g, gid] = findgroups(hbin_VWC_09);
out = splitapply(@(vec_VWC_09)mean(vec_VWC_09,'omitnan'), vec_NEP_VWC_09, g);
I don't know what doesn't look right but my guess is that it is addressed by the last sentence in my answer. g likely does not correspond to the bin IDs (hbin_VWC_09). Here's an example:
binID = [1 1 2 4 7 7 10 10 20];
[g, gid] = findgroups(binID)
g = 1×9
1 1 2 3 4 4 5 5 6
gid = 1×6
1 2 4 7 10 20
The groups (g) are 1:6 because there are 6 unique bin values. Group n corresponds to gid(n).
If I knew how you are using the output to splitapply, I might be able to suggest a solution.
But also consider arrayfun or just a simple loop!
uniqueBins = unique(hbin_VWC_09);
binMeans = zeros(size(uniqueBins));
for i = 1:numel(uniqueBins)
idx = hbin_VWC_09 == uniqueBins(i);
binMeans(i) = mean(vec_VWC_09(idx), 'omitnan');
end
*written on the fly, not tested.
Thank you so much again! I will look into your suggestions.
For those interested, I was able to solve this issue using the following code:
[~, edges_09] = histcounts(vec_VWC_09);
y_09 = discretize(vec_VWC_09, edges_09);
m_09 = grpstats(vec_NEP_VWC_09, y_09);
scatter(edges_09(1:end-1), m_09,'filled');
set(gca, 'xtick', edges_09(1:end-1));
xlabel('VWC');
ylabel('NEP');
Nice use of grpstats. Thanks for sharing!

Sign in to comment.

More Answers (1)

If I'm correct, if you called histcounts without ignoring the first output argument that output argument would contain a 0, indicating one of the bins has no data.
As a simpler example that demonstrates the problem I think you're experiencing:
try
splitapply(@sum, 1:3, [2 2 3]) % nothing in group 1
catch ME
fprintf("This code threw an error. Error message:\n%s", ME.message)
end
This code threw an error. Error message: For N groups, every integer between 1 and N must occur at least once in the vector of group numbers.
You could try adding a "dummy" value for the empty group(s).
splitapply(@(x) sum(x, 'omitnan'), [1:3 NaN], [2 2 3 1])
ans = 1×3
0 3 3

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!