How can I use LDA (Linear or Fisher Discrimnant Analysis) with an hardwritten digits dataset (like MNIST or USPS)?
Show older comments
I mean that LDA create a projecton of two or more classes in order to show their separability ( http://courses.ee.sun.ac.za/Pattern_Recognition_813/lectures/lecture01/img35.png). In MNIST foe example i have 60.000 classes 28x28 that represent the hardwritten digits (training set) and 10.000 matrix 28x28 that represent the test set. I can use LDA to compare each class in the test set with a class in the training set, but how can I say after i applied LDA if the test class is similar to the train class?
Thx in advance.
4 Comments
Ilya
on 3 Oct 2012
Show what you've done. Post some code. Tell us what MATLAB function you are using. Give an accurate definition of what you want to measure. Then someone might be able to help.
Gimmy
on 3 Oct 2012
Edited: Walter Roberson
on 5 Oct 2012
Ilya
on 3 Oct 2012
You have not provided any new info. You just repeated what you said before. The pictures linked in your original post and in your comment do not explain what you want to do.
Gimmy
on 4 Oct 2012
Accepted Answer
More Answers (2)
Greg Heath
on 6 Oct 2012
Edited: Greg Heath
on 6 Oct 2012
0 votes
It may help to forget LDA for a while and directly create a linear classifier using the slash operator. For example, since your images are of 10 digits 0:9, your target matrix should contains columns of the 10-dimensional unit matrix eye(10) where the row index of the 1 indicates the correct class index.
I doubt if you need all of the pixels in a 28X28 matrix. Therefore, I suggest averaging pixels to get a much smaller number I = nrowsr*ncolumns < 28*28.
Next, use the colon operator (:) to convert the matrices to column vectors. For each of the 10 classes choose a number of noisy training samples with Ntrni >> I for i = 1:10.
Form the input and target matrices with dimensions
[ I N ] = size(input)
[O N } = size(target)
O = 10 and N = sum(Ni) >> 10*I (e.g., ~ 100*I)
The linear model is
y = W * [ ones(1,N) ; input };
where the row of ones yield bias weights. The weight matrix is obtained from the slash LMSE solution
W = target/[ ones(1,N) ; input };
Class assignments are obtaned from
class = vec2ind(y);
I have always found this to be superior to LDA.
However, if for some reason you must use LDA, this provides an excellent model for comparisons.
Greg Heath
on 6 Oct 2012
Edited: Greg Heath
on 6 Oct 2012
0 votes
You use the term HARDwritten. Do you mean HANDwritten?
There are only 10 digits 0:9. Therefore, there are only 10 classes.
The numbers 60,000 and 10,000 represent the number of total samples that belong to one of the 10 classes. You don't need anywhere near that number to train and test a good classifier.
As i mentioned in my previous answer, I don't believe you need 28*28 =784 dimensions to discriminate between c = 10 classes. Use averaging or a low pass filter to reduce the image sizes to I = nrows*ncolumns. Then use the colon operator (:) to unfold each image into an I dimensional column vector.
With LDA you project the I dimensional vectors into a c-1 = 9 dimensional space defined by the dominant eigenvectors of (Sw\Sb). It's been ~ 30 years since I've done this and I don't remember the details. However, once you get these 9 dimensional projections you can imagine the 10 class mean projections in 9-space and check the references on how to make the classifications.
Since I don't remember the details and if I was in a hurry, I would just assign the vector to the closest class mean projection.
Hope this helps.
Greg
1 Comment
Categories
Find more on Text Detection and Recognition in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!