Get means for nested categorical variables

6 views (last 30 days)
Assuming you have the following table:
site = {'1','1','1','1','2','2','2','2','3','3','3','3'}';
site = categorical(site);
days = {'0','0','45','45','0','0','36','36','0','0','67','67'}';
days = categorical(days);
var = [23,45,67,21,42,87,84,65,12,77,42,90]';
T = table(site,days,var);
how do I make a new table T1 with each unique value of site and days, and the means of var over the days category? e.g. T1 = table(site,days,meanvar)

Accepted Answer

Akira Agata
Akira Agata on 14 Mar 2018
Please try the following:
[group,site,days] = findgroups(T.site,T.days);
meanvar = splitapply(@mean,T.var,group);
T1 = table(site,days,meanvar);

More Answers (2)

Peter Perkins
Peter Perkins on 15 Mar 2018
Another possibility:
>> T2 = varfun(@mean,T,'GroupingVariables',{'site' 'days'})
T2 =
6×4 table
site days GroupCount mean_var
____ ____ __________ ________
1 0 2 34
1 45 2 44
2 0 2 64.5
2 36 2 74.5
3 0 2 44.5
3 67 2 66
By the way, "1", "2", etc. are perhaps perfectly good category names for the site variable (as long as you remember that the values are text, and not the numeric values 1, 2, etc), but it seems like days is a numeric count. If that's true, there's no particular reason why you can't just leave it as numeric. The above code, and indeed the version usig findgroups/splitapply, continues to work.

Razvan Carbunescu
Razvan Carbunescu on 16 Mar 2018
Can also use groupsummary to compute directly one go from T:
>> GT = groupsummary(T,{'site','days'},'mean','var')
GT =
6×4 table
site days GroupCount mean_var
____ ____ __________ ________
1 0 2 34
1 45 2 44
2 0 2 64.5
2 36 2 74.5
3 0 2 44.5
3 67 2 66

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!