MATLAB Answers

How to include all variables in each decision tree of an ensemble?

27 views (last 30 days)
Haris
Haris on 13 Feb 2021
Answered: Aditya Patil on 16 Feb 2021
Hi everyone. I am fitting the following 10-tree ensemble.
X = rand(1000,50);
Y = rand(1000,1);
N = size(X,2);
Ntrees=10;
t = templateTree('NumVariablesToSample','all');
Mdl = fitrensemble(X,Y,'Method','LSBoost','Learners',t,'NumLearningCycles',Ntrees);
Below I extract the number of variables that are included in each of the 10 trees.
z = false(N,Ntrees);
for i = 1:Ntrees
idx = unique(Mdl.Trained{i}.CutPredictorIndex);
idx(idx==0)=[];
z(idx,i) = 1;
end
sum(z)
>> ans =
8 10 8 10 9 9 10 8 9 9
Despite setting 'NumVariablesToSample’ to ‘all’, when I extract the variables included in each tree, only 8-10 out of the 50 features are included in each tree. Does anyone have a suggestion on how to force all variables to be included in all trees? Thanks.

Answers (1)

Aditya Patil
Aditya Patil on 16 Feb 2021
'NumVariablesToSample' defines the number of variables(predictors) which will be considered at any given split. The decision tree algorithm picks random set of predictors, and then selects one of them, based on certain criterias.
It might not be necessary, or sometimes even possible, to use a specific variable in a tree. For example, consider if a prior split leaves samples of only one class. In such a case, selecting a decision boundary for that variable will not be possible.
If you need to use all variables, you can look at some of the other classification algorithms available in MATLAB, such as SVM.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!