How to automatically select the number of Latent variables - plsda script

12 views (last 30 days)
Hi,
I have a question. Using PLS_Toolbox to do a plsda model, when I upload the calibration datasets X and Y, the toolbox can automatically select the numebr of latent variables. Is there a way to "translate" this in a matlab scritpt?
After the cross-validation, is there anyway to make the script select automatically the number of components? Is there any function? My Y dataset is made of 3 columns and for each column we have 1 or 0 dependent on whether or not that sample belongs to that class.
% Modello PLS-DA prima della cross_val
modello_in=evrimodel('plsda');
% Calibrazione Modello PLS-DA prima della cross_val
modello_in.x=Xcal;
modello_in.y=Ycal;
modello_in.ncomp=5; % 5 LVs
modello_in.options.preprocessing={'autoscale' 'autoscale'};
modello_in.options.display='off';
modello_in=modello_in.crossvalidate({'vet' 10}, 15);

Answers (1)

Simar
Simar on 12 Jun 2024
Hi Pietro,
As per my understanding you want to extend script to include automatic selection of optimal number of latent variables based on cross-validation results and are looking for a function or method within the PLS_Toolbox or MATLAB environment that can facilitate this automatic selection process.
In PLS_Toolbox for MATLAB, the process of selecting number of latent variables (LVs) for a Partial Least Squares Discriminant Analysis (PLS-DA) model can indeed be automated, especially during cross-validation. Goal is to find optimal number of LVs that minimizes prediction error, which is crucial for building a robust and accurate model.
While PLS_Toolbox provides a user-friendly interface for these tasks, translating these actions into a MATLAB script offers more flexibility and automation. The crossvalidate method in PLS_Toolbox can be used not only to perform cross-validation but also to determine the optimal number of latent variables based on the cross-validation results.
Here is a conceptual approach to automatically selecting the number of components after cross-validation, adapted for a PLS-DA model in a MATLAB script. Note that specific function names and options might require adjustments based on the exact version of PLS_Toolbox in use:
% Define the PLS-DA model
model = evrimodel('plsda');
% Set the calibration data
model.x = Xcal; % Predictor variables
model.y = Ycal; % Response variables (classes encoded as 0 or 1)
% Set initial number of components
model.ncomp = 10; % Example: starting with 10 LVs
% Preprocessing options
model.options.preprocessing = {'autoscale', 'autoscale'};
model.options.display = 'off';
% Perform cross-validation
model = model.crossvalidate({'vet', 10, 'mc', 15});
% Extracting the optimal number of LVs from cross-validation results
[~, optimalLV] = min(model.cv.statistics.error); % Identifying the LVs with the minimum error
% Update the model with the optimal number of LVs
model.ncomp = optimalLV;
Note: The exact way to extract the optimal number of LVs might differ based on the structure of the 'model' object and the version of PLS_Toolbox.
Script outlines setting up a PLS-DA model, perform cross-validation, and then select the number of latent variables based on the cross-validation results. The key here is to analyze the model.cv.statistics.error array (or the equivalent in your version of PLS_Toolbox) to find the minimum error, which corresponds to the optimal number of components.
Please refer to the documentation for evrimodel-
Ensure checking the documentation for exact structure of model object after cross-validation, as the way to access the cross-validation statistics and errors might vary between different versions of the toolbox.
Hope it helps!
Best Regards,
Simar

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!