Main Content

RegressionBaggedEnsemble

Regression ensemble grown by resampling

Description

RegressionBaggedEnsemble combines a set of trained weak learner models and the data on which the learners were trained. Use the predict object function to predict the ensemble response for new data by aggregating predictions from the weak learners.

Creation

Create a bagged regression ensemble object using fitrensemble. Set the name-value argument Method of fitrensemble to "Bag" to use bootstrap aggregation, or bagging (for example, random forest).

For a description of bagged regression ensembles, see Bootstrap Aggregation (Bagging) and Random Forest.

Properties

expand all

Ensemble Properties

This property is read-only.

Method used to combine weak learner weights, returned as either 'WeightedAverage' or 'WeightedSum'.

Data Types: char

This property is read-only.

Fit information, returned as a numeric array. The FitInfoDescription property describes the content of this array.

Data Types: double

This property is read-only.

Description of the information in FitInfo, returned as a character vector or cell array of character vectors.

Data Types: char | cell

This property is read-only.

Fraction of training data resampled when the ensemble object is created, returned as a numeric scalar between 0 and 1. When creating the ensemble model object, fitrensemble resamples the training data randomly for every weak learner.

Data Types: double

This property is read-only.

Names of weak learners in the ensemble, returned as a cell array of character vectors. The name of each learner appears just once. For example, if you have an ensemble of 100 trees, LearnerNames is {'Tree'}.

Data Types: cell

This property is read-only.

Method used by fitrensemble to create the ensemble, returned as a character vector.

Data Types: char

This property is read-only.

Parameters used in training the ensemble, returned as an EnsembleParams object. The properties of ModelParameters include the type of ensemble, either 'classification' or 'regression', the Method used to create the ensemble, and other parameters, depending on the ensemble.

This property is read-only.

Number of trained weak learners in the ensemble, returned as a positive integer.

Data Types: double

This property is read-only.

Reason the fitrensemble function stopped adding weak learners to the ensemble, returned as a character vector.

Data Types: char

This property is read-only.

Result of using the regularize object function on the ensemble, returned as a structure. Use Regularization with shrink to lower the resubstitution error and shrink the ensemble.

Data Types: struct

This property is read-only.

Indication that the ensemble was trained with replacement, returned as true or false.

Data Types: logical

This property is read-only.

Trained weak learners, returned as a cell vector. The entries of the cell vector contain the corresponding compact regression models.

Data Types: cell

This property is read-only.

Trained weak learner weights, returned as a numeric vector. TrainedWeights has NumTrained elements, where NumTrained is the number of weak learners in the ensemble. The ensemble computes the predicted response by aggregating weighted predictions from its learners.

Data Types: double

This property is read-only.

Indicator that an observation was used to train a learner, returned as a logical matrix of size n-by-NumTrained, where n is the number of rows of training data and NumTrained is the number of trained weak learners. UseObsForLearner(i,j) is true if observation i was used for training learner j, and is false otherwise.

Data Types: logical

Predictor Properties

This property is read-only.

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the NumBins name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the NumBins value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for the numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

Data Types: cell

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: single | double

This property is read-only.

Expanded predictor names, returned as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

This property is read-only.

Predictor names, specified as a cell array of character vectors. The order of the entries in PredictorNames is the same as in the training data.

Data Types: cell

This property is read-only.

Predictor values, returned as a real matrix or table. Each column of X represents one variable (predictor), and each row represents one observation.

Data Types: double | table

Response Properties

This property is read-only.

Name of the response variable, returned as a character vector.

Data Types: char

Response transformation function, specified as "none" or a function handle. ResponseTransform describes how the software transforms raw response values.

For a MATLAB® function or a function that you define, enter its function handle. For example, you can enter Mdl.ResponseTransform = @function, where function accepts a numeric vector of the original responses and returns a numeric vector of the same size containing the transformed responses.

Data Types: char | string | function_handle

This property is read-only.

Response data, returned as a numeric column vector with the same number of rows as X. Each entry in Y is the response to the data in the corresponding row of X.

Data Types: double

Other Data Properties

This property is read-only.

Description of the cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty if the OptimizeHyperparameters name-value argument is nonempty when you create the model. The value of HyperparameterOptimizationResults depends on the setting of the Optimizer option in HyperparameterOptimizationOptions when you create the model.

  • "bayesopt" (default) — Object of class BayesianOptimization

  • "gridsearch" or "randomsearch" — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

This property is read-only.

Number of observations in the training data, returned as a positive integer. NumObservations can be less than the number of rows of input data when there are missing values in the input data or response data.

Data Types: double

This property is read-only.

Rows of the original predictor data X used for fitting, returned as an n-element logical vector, where n is the number of rows of X. If the software uses all rows of X to create the object, then RowsUsed is an empty array ([]).

Data Types: logical

This property is read-only.

Scaled weights in the ensemble, returned as a numeric vector. W has length n, the number of rows in the training data. The sum of the elements of W is 1.

Data Types: double

Object Functions

compactReduce size of machine learning model
crossvalCross-validate machine learning model
cvshrinkCross-validate pruning and regularization of regression ensemble
gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
limeLocal interpretable model-agnostic explanations (LIME)
lossRegression error for regression ensemble model
oobLossOut-of-bag error for bagged regression ensemble model
oobPermutedPredictorImportanceOut-of-bag predictor importance estimates for random forest of regression trees by permutation
oobPredictPredict out-of-bag responses of bagged regression ensemble
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predictPredict responses using regression ensemble model
predictorImportanceEstimates of predictor importance for regression ensemble of decision trees
regularizeFind optimal weights for learners in regression ensemble
resubLossResubstitution loss for regression ensemble model
resubPredictPredict response of regression ensemble by resubstitution
resumeResume training of regression ensemble model
shapleyShapley values
shrinkPrune regression ensemble

Examples

collapse all

Load the carsmall data set. Consider a model that explains a car's fuel economy (MPG) using its weight (Weight) and number of cylinders (Cylinders).

load carsmall
X = [Weight Cylinders];
Y = MPG;

Train a bagged ensemble of 100 regression trees using all measurements.

Mdl = fitrensemble(X,Y,Method="Bag")
Mdl = 
  RegressionBaggedEnsemble
             ResponseName: 'Y'
    CategoricalPredictors: []
        ResponseTransform: 'none'
          NumObservations: 94
               NumTrained: 100
                   Method: 'Bag'
             LearnerNames: {'Tree'}
     ReasonForTermination: 'Terminated normally after completing the requested number of training cycles.'
                  FitInfo: []
       FitInfoDescription: 'None'
           Regularization: []
                FResample: 1
                  Replace: 1
         UseObsForLearner: [94×100 logical]


  Properties, Methods

Mdl is a RegressionBaggedEnsemble model object.

Mdl.Trained is the property that stores a 100-by-1 cell vector of the trained, compact regression trees (CompactRegressionTree model objects) that compose the ensemble.

Plot a graph of the first trained regression tree.

view(Mdl.Trained{1},Mode="graph")

Figure Regression tree viewer contains an axes object and other objects of type uimenu, uicontrol. The axes object contains 24 objects of type line, text. One or more of the lines displays its values using only markers

By default, fitrensemble grows deep trees for bags of trees.

Estimate the in-sample mean-squared error (MSE).

L = resubLoss(Mdl)
L = 
12.4048

Tips

For a bagged ensemble of regression trees Mdl, the Trained property of Mdl stores a cell vector of Mdl.NumTrained CompactRegressionTree model objects. For a textual or graphical display of tree t in the cell vector, enter

view(Mdl.Trained{t})

Alternative Functionality

Bootstrap Aggregation Methods

For classification or regression, you can choose two approaches for bagging:

For help choosing between these approaches, see Ensemble Algorithms and Suggestions for Choosing an Appropriate Ensemble Algorithm.

Extended Capabilities

expand all

Version History

Introduced in R2011a