# How to define categorical within factors in fitrm?

4 views (last 30 days)
Jan on 17 Nov 2014
Commented: Stephane on 16 Mar 2016
I've observed unexpected behaviour of categorical factors in fitrm. Two ways to produce the same table of within factors produce different fitrm output. Why is this?
I've got a completely within-subjects design with 8 observations on 20 subjects. I first specified the two within factors by using table2array on the design matrix, with categorical indices for the factor levels:
within_fact = categorical(fullfact(nr_cond, nr_sessions]));
within_tbl1 = array2table(within_fact,'VariableNames',{'Condition','Session'});
The second way was to make the two factors only categorical after transforming them into a table:
within_fact = fullfact([nr_cond,nr_sessions]);
within_tbl2 = array2table(within_fact,'VariableNames',{'Condition','Session'});
within_tbl2.Condition = categorical(within_tbl2.Condition);
within_tbl2.Session = categorical(within_tbl2.Session);
According to the isequal function, the two factor tables are identical:
isequal(within_tbl1, within_tbl2)
ans = 1
However, they produce different outcomes when used as within design in a rm anova:
% M = 20x8 double matrix
data = array2table(M, 'VariableNames', {'S1C1', 'S1C2', 'S2C1', 'S2C2',...
'S3C1', 'S3C2', 'S4C1', 'S4C2'});
rm1 = fitrm(data,'S1C1-S4C2 ~ 1','WithinDesign',within_tbl1);
ranovatbl1 = ranova(rm1, 'WithinModel', 'Condition*Session');
rm2 = fitrm(data,'S1C1-S4C2 ~ 1','WithinDesign',within_tbl2);
ranovatbl2 = ranova(rm2, 'WithinModel', 'Condition*Session');
Unexpectedly, rm1 produces a anova table with df=3 for both Condition (which only has 2 levels) and Session (which has 4). rm2 has the correct df's (1 for Condition, 3 for Sessions). I'm confused as to why they produce different outcomes.
(MatLab 2014b on Win 7)
Stephane on 16 Mar 2016
Thanks for this question/answer - helped a lot. Indeed, the categorical function span over all the matrix elements (there is no dependance on rows or columns).
I overcame this problem by directly creating a within table with cells of strings as in
within_table = table({'c1','c2','c1','c2'}', {'L1','L1','L2','L2'}')

Tom Lane on 19 Nov 2014
When you create within_fact, you are defining a matrix with categories from 1 up to max(nr_cond, nr_sessions). So both columns of within_tbl1 are defined to have that many categories. It may be, say, that the first column has the notion of category 5 but doesn't actually have any data on that category.
When you create the variables in within_tbl2, each categorical variable is defined separately so it only has as many categories as actually appear in that column. This is what you almost certainly want.
Perhaps the way fitrm deals with this condition could be improved. I'll look into it.
Jan on 19 Nov 2014
I see, yes, that makes sense! I had interpreted the categorical function as merely putting a label 'categorical' on each value in within_fact and consequently did not expect this behaviour. But if the definition of these categorical labels depends on the entire dataset that you give the function, the outcome will be different for a matrix or a combination of two columns (as in my example). Thanks for the explanation!