Understanding the parameters in PRINCOMP

5 views (last 30 days)
Hi all, I have done a bit of research on this topic and it always seems to lead me back to the same question. Let me lay it all on the table, from what I understand, Principal Component Analysis is suppose to pick out from a large set of data the most important parts for you to work with. For example, the data I am using is a matrix of 1024x100, it is essentially all different backgrounds. 1024 is the number of pixels and 100 is the number of different types of background(s). Using PCA I can reduce the number of background information I have from 100 to a smaller number, so 1024xN, where N is the most important parts of the original 100. Now when using [COEFF, SCORE, VARIANCE] = princomp(data), COEFF gives a matrix of 100x100 and SCORE is 1024x100.
What exactly is COEFF and SCORE? I thought I should get a new set of data that looks like 1024xN, where N is smaller than 100.
Now I read a few post saying that, to generate the first principal component you use the first column of COEFF and multiply as follows:
if the first column of COEFF is [A, B, C, D,...] then the 1st Principal Component is given by: P1= data(:,1)*A + data(:,2)*B + data(:,3)*C + ...
Shouldn't the 1st principal component be a matrix (in my case) of dimensions 1024x1? P1 (above) doesn't give this.
How do I get the most important "parts" for what is generated using princomp? Again I have 100 backgrounds, some of which are redundant or noise and I want to only use the most important ones, how do I get that data from using princomp?
Any help is much appreciated, thanks in advance!

Accepted Answer

Shashank Prasanna
Shashank Prasanna on 24 Apr 2013
What you explain is correct. SCORES are just projection of your data onto the principal components (the new basis/axis that maximally explains the variance in your data). COEFF is new basis or vectors.
But you want to sub-select the important "parts" as you mention from the SCORES and project them back to you original space.
You can do this using the PCARES function:
[~,reconstructed] = pcares(X,ndim)
But PCARES will not return 1024x1 by will return 1024x100 because it is in the original space.
If you really do want to bring it back in a lower dimension you can use the following code:
reconstructed = score(:,1:ndim)*coeff(1:ndim,1:ndim)';
You may want to add the mean of the data back. PRINCOMP substracts the mean of the data before it computes the PCA.
  3 Comments
Shashank Prasanna
Shashank Prasanna on 24 Apr 2013
The Principal Components vectors are in COEFF (The new basis)
If you want the reconstructed data once you figure out that, say you want the first two components, PCARES function I mentioned before does just this.
[~,reconstructed] = pcares(data,2);
reconstructed will still be in 1024x100 and NOT 1024x2.
Why?
Because you reduced the dimensions in principal component space, and then reconstructed it back to the 100 variable space. But it has information of only two principal components projected back to 100 variable space.
If you want two variables in the original, reconstructed space, then use the output of PCARES, 'reconstructed' and throw out the last few columns keeping 2.
Note: You mentioned "I thought running princomp is suppose to calculate which sets of data are important and those set(s) are the principal component(s)."
Some other software or even some file exchange code out there does everything for you maybe. Even reduce your dimensions in the original space. But MATLAB gives you access to everything and lets you decide. hth.
Altaz Khan
Altaz Khan on 30 Apr 2013
Thanks Shashank, I haven't been able to try your suggestion but as soon as I get the results I will let you know. Thanks again!

Sign in to comment.

More Answers (0)

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!