Default LDA fitting algorithm used in fitcdiscr, classification algorithm used by predict, formula for DeltaPredictor property

6 views (last 30 days)
Hi everyone,
I'm trying to figure out the nuts and bolts of the LDA implementation in MATLAB 2020a when using the default settings (and no hyperparameter optimization or regularization) in the fitcdiscr function and the subsequent predict function.
Without going into too much detail about my data, I'm trying to use these two functions to decode data with 41 classes using somewhere between 100 and 200 predictors. I have a basic understanding of the properties that are contained in the model object, but I had some specific questions. I'd also just settle for the specific references and formulas/algorithms used for both the default LDA model fitting and prediction. Absent that, though, I had the following questions:
  1. How exactly is the LDA fitting procedure performed? Is this being done analytically or via parameter estimation? Is it trying to find the multi-class extension of the Fisher discriminant or is some other procedure being used?
  2. Similarly, how is the predict function classifying the held out data? My somewhat basic understanding is that a given data point is subjected to repeated pairwise comparisons between every class as instantiated by taking the dot product of the predictor values and the linear coefficient of a given comparison (i.e. class 3 vs class 4), adding the constant for that comparison, and generating a score. This is then repeated for every comparison, the scores are aggregated, and assuming equal priors (which my data has, every class an equal number of data points) the class with the highest score is the label that's assigned. Is this correct or is some sort of one vs. all comparison made? If it's the former, is it simply the sum of the scores or some other sort of aggregate measure?
  3. How is the overall DeltaPredictor property calculated? Is this just the matrix norm of the coefficient matrix comprised of every class comparison or is it some sort of aggregate of vector norms for each set of coefficients?
  4. One of the goals of my project is to cluster predictors into different groups based on their contribution to the model. I'm seeing some mixed information regarding using the coefficient values for that purpose. My understanding is that they're not so much a proxy for predictor important but rather just parameter values that represent the hyperplane separating two classes and that the matrix of coefficients just represent a Num. Classes x Num. Classes number of hyperplanes used to classify the data. Am I thinking about this correctly or do the coefficient values actually represent a single high-dimensional hyperplane/discriminant? Is it the case that the coefficient values for a given predictor can actually be used as a measure of importance to the model (or at least a given class comparison), but that some sort of normalization of the data is required first? I'm assuming so since it seems like the model will otherwise simply assign high absolute coefficient values to predictors with a small range of values and lower coefficient values to predictors with a larger range of values.
Sorry for the verbose question. At the end of the day I'm really just trying to figure out exactly how MATLAB implements LDA fitting and prediction so I can thoroughly understand and interpret the model.
Thanks in advance!

Answers (1)

Aditya on 20 Mar 2024
Linear Discriminant Analysis (LDA) in MATLAB, especially when using the fitcdiscr function, is a powerful method for both classification and dimensionality reduction. Let's break down your questions to provide a clearer understanding of how LDA is implemented and used in MATLAB, particularly focusing on the default settings of fitcdiscr and the predict function.
LDA Fitting Procedure:
The LDA fitting procedure in MATLAB, when using fitcdiscr with default settings, is primarily analytical rather than based on iterative parameter estimation methods like those used in logistic regression or neural networks. LDA aims to find linear combinations of predictors that best separate the classes. This is achieved through the following steps:
  1. Calculate the within-class scatter matrix (Sw) and the between-class scatter matrix (Sb).
  2. Solve the generalized eigenvalue problem for the matrix (Sw^{-1}Sb).
  3. The eigenvectors corresponding to the largest eigenvalues are used to form a set of linear discriminants.
For multi-class classification, MATLAB extends the binary LDA approach to multi-class using a one-vs-rest (OvR) strategy by default. This involves computing discriminant functions for each class against all others and then assigning the class based on the discriminant function that provides the highest score for a given observation.
Prediction in LDA:
The predict function for an LDA model in MATLAB classifies new observations based on the linear discriminants obtained during the fitting process. The classification of a new observation is done by projecting it onto the linear discriminants and then evaluating which class's centroid it is closest to, taking into account the prior probabilities of the classes. If priors are equal and assuming a normal distribution for the predictors within each class, the decision is essentially based on the Mahalanobis distance to the centroids of the classes in the reduced-dimensional space formed by the discriminants.
DeltaPredictor Property:
The DeltaPredictor property in the LDA model object from MATLAB's fitcdiscr function is not a standard property documented in the MATLAB documentation. It's possible you might be referring to properties related to the coefficients or the difference in predictive power when excluding a predictor, but clarification on this property would be needed to provide a specific answer.
Coefficients and Predictor Importance:
In LDA, the coefficients of the discriminant functions indeed represent the directions in the feature space that maximize the separation between classes. While these coefficients can give insights into how each predictor contributes to the discrimination between classes, interpreting them directly as measures of predictor importance can be misleading without considering the scale of the predictors. Normalization or standardization of predictors before fitting the model can help in making the coefficients more interpretable in terms of importance. However, it's crucial to remember that LDA coefficients define the hyperplanes for class separation and are not inherently measures of individual predictor importance in the same way as, for example, feature importances in tree-based models.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!