Supervised learning is the most common type of machine learning algorithms. It uses a known dataset (called the training dataset) to train an algorithm with a known set of input data (called features) and known responses to make predictions. The training dataset includes labeled input data that pair with desired outputs or response values. From it, the supervised learning algorithm seeks to create a model by discovering relationships between the features and output data and then makes predictions of the response values for a new dataset.
Prior to applying supervised learning, unsupervised learning is frequently used to discover patterns in the input data that suggest candidate features, and feature engineering transforms them to be more suitable for supervised learning. In addition to identifying features, the correct category or response needs to be identified for all observations in the training set, which is a very labor-intensive step. Semi-supervised learning lets you train models with very limited labeled data and thus reduce the labelling effort.
Once the algorithm is trained, a test dataset, which hasn’t been used for training, is typically used to predict the performance of the algorithm and validate it. To obtain accurate performance results, it is critical that both the training and test set are a good representation of “reality”( i.e., data from the production environment and the model were both validated correctly).
You can train, validate, and tune predictive supervised learning models in MATLAB® with Deep Learning Toolbox™, and Statistics and Machine Learning Toolbox™.
Supervised Learning Algorithms Categories
Classification: Used for categorical response values, where the data can be separated into specific classes. A binary classification model has two classes and a multiclass classification model has more. You can train classification models with the Classification Learner app with MATLAB.
Common classification algorithms include:
- Logistic regression
- Support vector machine (SVM)
- Neural network
- Naïve Bayes classifier
- Decision tree
- Discriminant analysis
- Nearest neighbors (kNN)
- Ensemble classification
Regression: Used for numerical continuous-response values. You can train regression models with the Regression Learner app with MATLAB.
Common regression algorithms include:
- Linear regression
- Nonlinear regression
- Generalized linear model
- Decision tree
- Neural network
- Gaussian process regression
- Support vector machine regression
- Ensemble regression
Supervised Learning Applications
Supervised learning is used in financial applications for credit scoring, algorithmic trading, and bond classification, in image and video applications for object classification and tracking, in industrial applications for outlier detection, in predictive maintenance for life of equipment estimates, in biological applications for tumor detection and drug discovery, and in energy applications for price and load forecasting.
Example
Let's assume you want to predict housing prices and have historical data on the housing sales with home sizes, locations, and year sold as features, and the actual sale price as known response. That is an excellent use case for supervised regression, and you can try this out yourself in this example. The weights of a linear model shown below make sense: type and size of home, year built, and neighborhood indeed determine home values. The residual plot indicates the linear model captures the relationship between variables and price reasonably well