Feature Selection and Feature Transformation Using Classification Learner App
Investigate Features in the Scatter Plot
In Classification Learner, try to identify predictors that separate classes well by plotting different pairs of predictors on the scatter plot. The plot can help you investigate features to include or exclude. You can visualize training data and misclassified points on the scatter plot.
Before you train a classifier, the scatter plot shows the data. If you have trained a classifier, the scatter plot shows model prediction results. Switch to plotting only the data by selecting Data in the Plot controls.
Choose features to plot using the X and Y lists under Predictors.
Look for predictors that separate classes well. For example, plotting the
fisheririsdata, you can see that sepal length and sepal width separate one of the classes well (
setosa). You need to plot other predictors to see if you can separate the other two classes.
Show or hide specific classes using the check boxes under Show.
Change the stacking order of the plotted classes by selecting a class under Classes and then clicking Move to Front.
Investigate finer details by zooming in and out and panning across the plot. To enable zooming or panning, hover the mouse over the scatter plot and click the corresponding button on the toolbar that appears above the top right of the plot.
If you identify predictors that are not useful for separating out classes, then try using Feature Selection to remove them and train classifiers including only the most useful predictors.
After you train a classifier, the scatter plot shows model prediction results. You can show or hide correct or incorrect results and visualize the results by class. See Plot Classifier Results.
You can export the scatter plots you create in the app to figures. See Export Plots in Classification Learner App.
Select Features to Include
In Classification Learner, you can specify different features (or predictors) to include in the model. See if you can improve models by removing features with low predictive power. If data collection is expensive or difficult, you might prefer a model that performs satisfactorily without some predictors.
On the Classification Learner tab, in the Features section, click Feature Selection.
In the Feature Selection dialog box, clear the check boxes for the predictors you want to exclude, and then click OK.
Click Train to train a new model using the new predictor options.
Observe the new model in the Models pane. The Current Model Summary pane displays how many predictors are excluded.
To check which predictors are included in a trained model, click the model in the Models pane and observe the check boxes in the Feature Selection dialog box.
You can try to improve the model by including different features in the model.
For an example using feature selection, see Train Decision Trees Using Classification Learner App.
Transform Features with PCA in Classification Learner
Use principal component analysis (PCA) to reduce the dimensionality of the predictor space. Reducing the dimensionality can create classification models in Classification Learner that help prevent overfitting. PCA linearly transforms predictors in order to remove redundant dimensions, and generates a new set of variables called principal components.
On the Classification Learner tab, in the Features section, select PCA.
In the Advanced PCA Options dialog box, select the Enable PCA check box, and then click OK.
When you next click Train, the
pcafunction transforms your selected features before training the classifier.
By default, PCA keeps only the components that explain 95% of the variance. In the Advanced PCA Options dialog box, you can change the percentage of variance to explain by selecting the Explained variance value. A higher value risks overfitting, while a lower value risks removing useful dimensions.
If you want to manually limit the number of PCA components, in the Component reduction criterion list, select
Specify number of components. Select the Number of numeric components value. The number of components cannot be larger than the number of numeric predictors. PCA is not applied to categorical predictors.
Check PCA options for trained models in the Current Model Summary pane information. Check the explained variance percentages to decide whether to change the number of components. For example:
PCA is keeping enough components to explain 95% variance. After training, 2 components were kept. Explained variance per component (in order): 92.5%, 5.3%, 1.7%, 0.5%
To learn more about how Classification Learner applies PCA to your data, generate
code for your trained classifier. For more information on PCA, see the
Investigate Features in the Parallel Coordinates Plot
To investigate features to include or exclude, use the parallel coordinates plot. You can visualize high-dimensional data on a single plot to see 2-D patterns. The plot can help you understand relationships between features and identify useful predictors for separating classes. You can visualize training data and misclassified points on the parallel coordinates plot. When you plot classifier results, misclassified points have dashed lines.
On the Classification Learner tab, in the Plots section, click the arrow to open the gallery, and then click Parallel Coordinates in the Validation Results group.
On the plot, drag the X tick labels to reorder the predictors. Changing the order can help you identify predictors that separate classes well.
To specify which predictors to plot, use the Predictors check boxes. A good practice is to plot a few predictors at a time. If your data has many predictors, the plot shows the first 10 predictors by default.
If the predictors have significantly different scales, scale the data for easier visualization. Try different options in the Scaling list:
Nonedisplays raw data along coordinate rulers that have the same minimum and maximum limits.
Rangedisplays raw data along coordinate rulers that have independent minimum and maximum limits.
Z-Scoredisplays z-scores (with a mean of 0 and a standard deviation of 1) along each coordinate ruler.
Zero Meandisplays data centered to have a mean of 0 along each coordinate ruler.
Unit Variancedisplays values scaled by standard deviation along each coordinate ruler.
L2 Normdisplays 2-norm values along each coordinate ruler.
If you identify predictors that are not useful for separating out classes, use Feature Selection to remove them and train classifiers including only the most useful predictors.
The plot of the
fisheriris data shows the petal
length and petal width features separate the classes best.
For more information, see
You can export the parallel coordinates plots you create in the app to figures. See Export Plots in Classification Learner App.