How to compare classifiers on different datasets?

1 view (last 30 days)
Hi everybody! I have to test 3 classifiers on the same dataset: a Naive Bayesian classifier, a Logistic Regression classifier and a LDA classifier.
For the Naive Bayesian classifier I can use the entire dataset X, however for the Logistic Regression and for the LDA I have to use a reduced dataset Xrid, that contains only the linearly independent columns in order to invert the matrix.
My answer is: in order to compare the 3 classifiers, is better to train the Naive Bayesian classifier with the entire dataset X, while training LDA and Logistic Regression with the reduced dataset Xrid? Or is better to the train all the classifiers on the same dataset, i.e. on the reduced dataset Xrid?
Because on one side, it's more significant to compare 3 classifiers when they are used on the same dataset, but on the other hand for deciding the best classifier it's also significant that NB classifier can be used on the entire dataset, while the others must be used on a modified dataset.
What is the most effective way to compare these classifiers?

Answers (1)

the cyclist
the cyclist on 1 Feb 2023
I don't know that there is one best answer to this.
I think that a useful way to think about it is to realize that what you really care about (presumably) is how each classifier will perform on a brand-new, "out of sample" dataset (i.e. how it will generalize). If the NB classifier is able to take advantage of more information from the existing dataset, in order to generalize more accurately, then I think it is fair game to allow it to use the full training set.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!