How can I use LDA (Linear or Fisher Discrimnant Analysis) with an hardwritten digits dataset (like MNIST or USPS)?

I mean that LDA create a projecton of two or more classes in order to show their separability ( http://courses.ee.sun.ac.za/Pattern_Recognition_813/lectures/lecture01/img35.png). In MNIST foe example i have 60.000 classes 28x28 that represent the hardwritten digits (training set) and 10.000 matrix 28x28 that represent the test set. I can use LDA to compare each class in the test set with a class in the training set, but how can I say after i applied LDA if the test class is similar to the train class?
Thx in advance.

4 Comments

Show what you've done. Post some code. Tell us what MATLAB function you are using. Give an accurate definition of what you want to measure. Then someone might be able to help.
I've a matrix called tot_train that is 28x60000 represent the 60000 train images(one image is 28x28), and a matrix called test_tot that is 10000 and represent the test images. What i do is to take an image from test_tot (28x28) and an image from train_tot (28x28) and apply LDA on this two matrix in order to define if the test_tot matrix is similar to the train_tot matrix. But when I arrive at this graph: https://www.projectrhea.org/oldkiwi/images/8/81/FLD.jpg how can I say if the two number are the same or not?
You have not provided any new info. You just repeated what you said before. The pictures linked in your original post and in your comment do not explain what you want to do.
% X1 and X2 are matrix 28x28 that represent 2 hardwritten digit.
n1=size(X1,1); n2=size(X2,1);
Mu1=mean(X1)'; Mu2=mean(X2)';
S1=cov(X1); S2=cov(X2); Sw=S1+S2;
SB=(Mu1-Mu2)*(Mu1-Mu2)';
invSw=inv(Sw); invSw_by_SB=invSw*SB;
[V,D]=eig(invSw_by_SB);
% now i choose the best eiegnvector, named W
newX1=X1*W; newX2=X2*W;
% and now i plot them with scatter. % At this point, how can I say if the hardwritten digit X1 is similar to % X2?

Sign in to comment.

 Accepted Answer

I am not an expert in image analysis, but it seems you misunderstand what you need to do. LDA uses matrix X in which rows are observations (i.i.d. sampled from a vector random variable) and columns are predictors (elements of this random variable). Your 28x28 matrix is pixels. Its rows are not i.i.d. observations of a vector random variable. The entire 28x28 matrix is predictors. Sometimes in image analysis they just flatten out the matrix of pixels. Then you would have a 60k-by-784 matrix of training data and 10k-by-784 matrix of test data.
You could use the new ClassificationDiscriminant class or the old classify function from Statistics Tlbx to perform LDA. Or you could write your own code in the spirit of what you've shown above. ClassificationDiscriminant.fit and classify find K*(K-1)/2 linear decision boundaries between K classes. Your code suggests that you want to find K-1 directions for the optimal variable transformation by LDA for K classes.
For your specific code, do not use inv. Instead use eig(SB,Sw,'chol') to find V and D. There is only one usable eigenvector in V because the rank of SB is 1.

8 Comments

I re-edit the code:
c_train=train_images(:,1)';
c_test=test_images(:,1)';
figure(1)
clf;
subplot(2,1,1);
plot(c_test);
title(test_labels(1));
subplot(2,1,2);
plot(c_train);
title(train_labels(i));
mu_train=mean(c_train);
mu_test=mean(c_test);
S_train=cov(c_train);
S_test=cov(c_test);
Sw=S_train+S_test;
SB=(mu_train-mu_test)*(mu_train-mu_test)';
invSw=inv(Sw);
invSw_by_SB=invSw*SB;
[vector,value]=eig(SB,Sw,'chol');
newc_test=c_test*vector;
newc_train=c_train*vector;
figure(2)
clf;
plot(newc_test,'r');
hold on;
plot(newc_train,'b');
........
where c_test and c_train are 2 vector 1x784, like you say. however the problem is still the same: once i obtained the vector and the newc_train and newc_test how can i say if c_test is the same hardwritten digit of c_test?
Your code is doing something nonsensical.
LDA is used for supervised learning. That is, for every 28x28 matrix you need to know what the true digit is. Then you can form Xtrain of size 60k-by-784 and you would have Ytrain, a vector of class labels. Then you could fit an LDA model to your training data. Then you could apply the fitted model to your test data. Even if you don't have Statistics Tlbx, you can still look at the workflow and examples in the doc: http://www.mathworks.com/help/stats/discriminant-analysis.html
If you do not know the true digits in your training data, you don't want LDA.
I have what you say:
train_images = loadMNISTImages('train-images.idx3-ubyte'); % matrix 784x60000
train_labels = loadMNISTLabels('train-labels.idx1-ubyte')'; % vector 1x60000
test_images = loadMNISTImages('t10k-images.idx3-ubyte'); % matrix 784x10000
test_labels = loadMNISTLabels('t10k-labels.idx1-ubyte')'; % vector 1x10000
But what do you mean with "you could fit an LDA model to your training data"? You mean that i have to calculate Sw and Mu for every group of digits (such as Sw and Mu for all the class that represent 0, 1, ...) and then compare each one of them with every train class? And so evrey column in 784x60000 and 784x10000 represent a digit?
EDIT: I'm trying to use the matlab function ClassificationDiscriminant.fit and predict, but i can't use mnist dataset because there each digit is represent by 784x1 vector, while ClassificationDiscriminant.fit accept observation in form 1x1... how can i do?
It's not quite what I said. I said your Xtrain needs to be 60k-by-784. You need to transpose your train_images. Then you can pass your train_images and train_labels to ClassificationDiscriminant.fit. By convention in Statistics Tlbx you must have one observation per row and one predictor per column.
I've used cls = ClassificationDiscriminant.fit(train_images,train_labels,'discrimType','pseudoLinear'); and it works, even if i don't understand it completly. After that i use
label = predict(cls,test_images);
ok=0;
no=0;
for i=1:n
if(label(i)==test_labels(i))
ok=ok+1;
else
no=no+1;
end
end
in order to show how much digits LDA recognize. Now i'm trying to use LDA with my initial method, without using ClassificationDiscriminant.fit and predict function...
Good progress. At the ClassificationDiscriminant class page http://www.mathworks.com/help/stats/classificationdiscriminantclass.html, you can find various methods and properties. In particular, you can use Sigma And BetweenSigma to cross-check your calculations of the within and between-class covariance matrices.
here is what i've done:
clear
train_images = loadMNISTImages('train-images.idx3-ubyte')';
train_labels = loadMNISTLabels('train-labels.idx1-ubyte');
test_images = loadMNISTImages('t10k-images.idx3-ubyte')';
test_labels = loadMNISTLabels('t10k-labels.idx1-ubyte');
ImageNum = 100;
count=0;
allmu_train=[];
allmu=0;
Sw=0;
Sw_train=[];
% calculate Sw and Mu for evrey number for i=0:9
for j=1:ImageNum
if(train_labels(j)==i)
count=count+1;
allmu = allmu+mean(train_images(j,:));
Sw = Sw+cov(train_images(j,:));
end
end
allmu=allmu/count;
allmu_train=cat(1,allmu_train,allmu);
Sw_train=cat(1,Sw_train,Sw);
count=0;
end
% calculate Y=W'*C for evry C in test_images
x0=[];
x1=[];
for i=1:ImageNum
if(mod(i,28)==0 || i==1)
ctest=test_images(i,:);
figure(1);
clf;
subplot(2,10,1);
plot(ctest);
for j=0:9
mu1=mean(ctest);
S1=cov(ctest);
Sw=S1+Sw_train(j+1);
SB=(mu1-allmu_train(j+1))*(mu1-allmu_train(j+1))';
invSw=pinv(Sw);
invSw_by_SB=invSw*SB;
%[vector,value]=eig(invSw_by_SB);
[vector,value]=eig(SB,Sw,'chol');
W=vector;
Y=W'*ctest;
subplot(2,10,j+11);
plot(Y);
end
end % breakpoint here to show the result of every ctest
end
But now that i've applied LDA, how can i say what number is contained in ctest? I should compare every Y obtain in every loop from 0 to 9, but how? Without using the predict function I mean...
Quoting what I wrote earlier:
ClassificationDiscriminant.fit finds K*(K-1)/2 linear decision boundaries between K classes. Your code suggests that you want to find K-1 directions for the optimal variable transformation by LDA for K classes.
There is a difference between LDA for classification and LDA for supervised variable transformation.
After you found Sw and mean vectors for every class, you can compute Mahalanobis distance squared from observation at x to the mean of every class. For example, if mu1 is a column-vector for the mean vector of class 1 and x is a column-vector of predictors
(x-mu1)'*pinv(Sw)*(x-mu1)
is the Mahalanobis distance squared between x and mu1. The smaller the distance, the more likely this observation is from this class.
You will have to figure out the rest by yourself. Read a book on LDA.

Sign in to comment.

More Answers (2)

It may help to forget LDA for a while and directly create a linear classifier using the slash operator. For example, since your images are of 10 digits 0:9, your target matrix should contains columns of the 10-dimensional unit matrix eye(10) where the row index of the 1 indicates the correct class index.
I doubt if you need all of the pixels in a 28X28 matrix. Therefore, I suggest averaging pixels to get a much smaller number I = nrowsr*ncolumns < 28*28.
Next, use the colon operator (:) to convert the matrices to column vectors. For each of the 10 classes choose a number of noisy training samples with Ntrni >> I for i = 1:10.
Form the input and target matrices with dimensions
[ I N ] = size(input)
[O N } = size(target)
O = 10 and N = sum(Ni) >> 10*I (e.g., ~ 100*I)
The linear model is
y = W * [ ones(1,N) ; input };
where the row of ones yield bias weights. The weight matrix is obtained from the slash LMSE solution
W = target/[ ones(1,N) ; input };
Class assignments are obtaned from
class = vec2ind(y);
I have always found this to be superior to LDA.
However, if for some reason you must use LDA, this provides an excellent model for comparisons.
You use the term HARDwritten. Do you mean HANDwritten?
There are only 10 digits 0:9. Therefore, there are only 10 classes.
The numbers 60,000 and 10,000 represent the number of total samples that belong to one of the 10 classes. You don't need anywhere near that number to train and test a good classifier.
As i mentioned in my previous answer, I don't believe you need 28*28 =784 dimensions to discriminate between c = 10 classes. Use averaging or a low pass filter to reduce the image sizes to I = nrows*ncolumns. Then use the colon operator (:) to unfold each image into an I dimensional column vector.
With LDA you project the I dimensional vectors into a c-1 = 9 dimensional space defined by the dominant eigenvectors of (Sw\Sb). It's been ~ 30 years since I've done this and I don't remember the details. However, once you get these 9 dimensional projections you can imagine the 10 class mean projections in 9-space and check the references on how to make the classifications.
Since I don't remember the details and if I was in a hurry, I would just assign the vector to the closest class mean projection.
Hope this helps.
Greg

1 Comment

Thx for the answer, now i'm trying to compare the mean of ctest with the mean of the classes (0:9) proojected (Y) in order to find the less differebnce between the means, but i'don't know why, the mean of first class 0 is always the nearest to mean of ctest.
EDIT: the problem is still the same: once i found the projected vector i don't know how to say what number is ctest. I tried by comapre the mean of Y with the mean of ctest and other solution but it still doesn't work...
clear
train_images = loadMNISTImages('train-images.idx3-ubyte');
train_labels = loadMNISTLabels('train-labels.idx1-ubyte');
test_images = loadMNISTImages('t10k-images.idx3-ubyte');
test_labels = loadMNISTLabels('t10k-labels.idx1-ubyte');
ImageNum = 1000;
count=0;
allmu_train=[];
allmu=0;
Sw=0;
Sw_train=[];
class_train=[];
tot=0;
for i=0:9
for j=1:60000
if(train_labels(j)==i)
count=count+1;
allmu = allmu+mean(train_images(:,j));
Sw = Sw+cov(train_images(:,j));
tot=tot+train_images(:,j);
end
end
allmu=allmu/count;
allmu_train=cat(2,allmu_train,allmu);
Sw_train=cat(2,Sw_train,Sw);
class_train=cat(2,class_train,(tot/count));
count=0;
tot=0;
end
label=[];
muY=[];
totY=[];
for i=1:ImageNum
ctest=test_images(:,i);
mu1=mean(ctest);
S1=cov(ctest);
mu_newct=[];
for j=0:9
Sw=S1+Sw_train(j+1);
SB=(mu1-allmu_train(j+1))*(mu1-allmu_train(j+1))';
invSw=inv(Sw);
invSw_by_SB=invSw*SB;
% [vector,value]=eig(invSw_by_SB);
[vector,value]=eig(SB,Sw,'chol');
W=vector;
Y=W*ctest;
muY=cat(2,muY,mean(Y));
totY=cat(2,totY,Y);
end
mudiff=[];
for k=1:10
mudiff=cat(2,mudiff,abs(muY(k)-ctest(k)));
end
newk=0;
minmudiff=mudiff(1);
for k=1:10
if(mudiff(k)<minmudiff)
minmudiff=mudiff(k);
newk=k-1;
end
end
label=cat(2,label,newk);
muY=[];
totY=[];
end
ok=0;
no=0;
for i=1:ImageNum
if(label(i)==test_labels(i))
ok=ok+1;
else
no=no+1;
end
end

Sign in to comment.

Asked:

on 3 Oct 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!