Predictors in classification learner app

12 views (last 30 days)
Anh
Anh on 24 May 2023
Commented: Anh on 24 May 2023
Hi all, I wonder about the predictor box ( red circle below) in classification learner app. Does it affect the classification results or is it just for illustration? Sometimes when I change the cells in the prediction box, the classification results change a little. Please help me to understand that, thank you :3

Accepted Answer

Askic V
Askic V on 24 May 2023
Edited: Askic V on 24 May 2023
This is explaind in the Matlab documentation here:
Before you train (learn) the model, only Data option in the plot is available. After you train the model, then you cen switch between Data and Model predictions. If youswitch to Model prediction you can see which data model predicted correctly and which not. False predictions are marked with "x".
By choosing Predictor variables on X and Y axis, you can investigate which features separate classes well and which don't.
For example, features that have low predictive power will exhibit significant overlap between different class labels or show no clear separation. In this way, you can choose to omit then in the input data as predictor variables.
So in summary, this is just a visualisation tool to help you gain better understanding of the data and features predistive power.
I'll try to explain this as best as I can on an example that is already available in Matlab.
Execute this code:
load carbig
Origin = categorical(cellstr(Origin));
Origin = mergecats(Origin,["France","Japan","Germany", ...
"Sweden","Italy","England"],"NotUSA");
cars = table(Acceleration,Displacement,Horsepower, ...
Model_Year,MPG,Weight,Origin);
cars = rmmissing(cars);
An then start Classification learner App. Train the model using default settings as shown in the figure:
So the gaol is to learn model to predict origin of a car based on 6 predictor variables(Acceleration, Displacement, Horsepower, Model_Year, MPG, Weight). As you know, some of these variables don't really have a meaning when it comes to guess the country of origin, but let's confirm that.
So if you choose variables such as Acceleration and Model_Year on the scatter plot, you'll see a significant overlap, which indicates that these variables are not suitable to be used as predictors i.e. they have very low predictive power (of course that model year cannot determine the origin in any way).
So the model (Fine Tree) has accuracy about 90.1% with all 6 features. So it seems that we can omit predictors 1 and 4.
But let's confirm that. If add another Fine Tree model and use "Feature Selection" option to remove features 1 and 4 and then train the model with 4/6 features, you'll improve its accuracy a bit.
So that's it.
If you have a lot of features (wide data set), than this job by visually examining data and predictors becomes tedious, there is a built in function you can use.
Export your original model (6/6 features) to the workspace and execute the follwoing code:
impValues = predictorImportance(trainedModel.ClassificationTree);
pareto(impValues)
The result will be as shown:
You can see that the feature nr. 2 (Displacement) brings about 70% of predictive power. Features 2, 3 and 6 carry about 97% of predictive power.
I hope this answer your question.
  1 Comment
Anh
Anh on 24 May 2023
awesome! your explanation helped me a lot. You must have spent a lot of time on the explanation above, that's very kind to me. Thank you, thank you very much!

Sign in to comment.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!