Matching Feature ranking algoritum outputs in classification leaner

Question

Christopher McCausland on 5 Nov 2023

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/2043287-matching-feature-ranking-algoritum-outputs-in-classification-leaner

Commented: Christopher McCausland on 8 Nov 2023

Hi,

In 2023b within the feature selection algoritum tab of the classification learner app, you can generate feature ranks with five diffrent algoritums: MRMR, Chi2, ReliefF, ANOVA and Kruskal Wallis.

MRMR and Chi2 can be replicated with:

[idx,scores] = fscmrmr(randSamp(:,3:end),'Stage');
[idx,scores] = fscchi2(randSamp(:,3:end),'Stage');

Where randSamp is a table with some variables ignored at the start and 'Stage' is the lable of intrest.

However, I cannot figure out how to replicate the same with ANOVA and KW, I have tried something like this:

[idx,scores] = anova1(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));
[idx,scores] = kruskalwallis(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));

And while it done compute *something* I have no idea what it is doing or how to get it to match what the classification learner app is doing. Can anyone shed some light on this?

Christopher

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Drew on 6 Nov 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2043287-matching-feature-ranking-algoritum-outputs-in-classification-leaner#answer_1347107

Open in MATLAB Online

The short answer is that, for some feature ranking techniques, there is some normalization of the features before the ranking. This is by design, since some feature ranking techniques are particularly sensitive to normalization. To see how Classification Learner is ranking the features, use the "Generate Function" button in Classification Learner to generate code to replicate the feature selection.

For example, take these steps to see some example generated code:

(1) t=readtable("fisheriris.csv");

(2) Start Classification Learner, load the fisher iris data, take defaults at session start

(3) Rank features with Kruskal-Wallis, choose keeping the top three features

(4) Train the default tree model

(5) In the Export area of the toolstrip, choose "Generate Function".

Below is a section of code from the function generated by Classification Learner. Notice the calls to "standardizeMissing" and "normalize" in the first two lines of (non-comment) code. These functions are also used in the later cross-validation part of the code. So, for each training fold (or for all of the training data for the final model), the "standardizeMissing" function and the default "zscore" method of the "normalize" function are being used before ranking the features. Note: The normalization used before feature ranking is independent of any normalization (or no normalization) used before model training.

% Feature Ranking and Selection
% Replace Inf/-Inf values with NaN to prepare data for normalization
predictors = standardizeMissing(predictors, {Inf, -Inf});
% Normalize data for feature ranking
predictorMatrix = normalize(predictors, "DataVariable", ~isCategoricalPredictor);
newPredictorMatrix = zeros(size(predictorMatrix));
for i = 1:size(predictorMatrix, 2)
    if isCategoricalPredictor(i)
        newPredictorMatrix(:,i) = grp2idx(predictorMatrix{:,i});
    else
        newPredictorMatrix(:,i) = predictorMatrix{:,i};
    end
end
predictorMatrix = newPredictorMatrix;
responseVector = grp2idx(response);
% Rank features using Kruskal Wallis algorithm
for i = 1:size(predictorMatrix, 2)
    pValues(i) = kruskalwallis(...
        predictorMatrix(:,i), ...
        responseVector, ...
        'off');
end
[~,featureIndex] = sort(-log(pValues), 'descend');
numFeaturesToKeep = 3;
includedPredictorNames = predictors.Properties.VariableNames(featureIndex(1:numFeaturesToKeep));
predictors = predictors(:,includedPredictorNames);
isCategoricalPredictor = isCategoricalPredictor(featureIndex(1:numFeaturesToKeep));

If this answer helps you, please remember to accept the answer.

1 Comment
Show -1 older commentsHide -1 older comments

Christopher McCausland on 8 Nov 2023

Hi Drew,

I thought I had accepted this answer and all, so appologies. It was a good idea to generate the code and then review, thank you for adding in the additional discription too. it made it a lot easier to follow the design thought path.

Christopher

Sign in to comment.

Matching Feature ranking algoritum outputs in classification leaner

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Matching Feature ranking algoritum outputs in classification leaner

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments