How to find the weight of PC_1 in my measurements, after doing PCA?

8 views (last 30 days)
Hi,
I'm trying to use the following code to understand PCA , SVD and it's releation:
% PCA_vs_SVD (my sand_box)
%% generate fake data points
my_fake_dataPoints = [-4 0 ; -2 1 ; -1 -1 ; 1 1 ; 3 2 ; 4 2];
% remove mean
my_fake_dataPoints_noMean = my_fake_dataPoints - mean(my_fake_dataPoints , 2);
% do PCA
[coeff , score , latent , tsquared , explained , mu] = pca(my_fake_dataPoints);
[coeff_noMean , score_noMean , latent_noMean , tsquared_noMean , explained_noMean , mu_noMean] = pca(my_fake_dataPoints_noMean);
% do SVD
[U , S , V] = svd(my_fake_dataPoints);
[U_noMean , S_noMean , V_noMean] = svd(my_fake_dataPoints_noMean);
%% plots
figure(1)
biplot(coeff , 'scores' , score , 'MarkerSize' , 30 , 'varlabels' ,{'var_1' , 'var_2'});
figure(2)
scatter(score(:,1) , score(:,2))
axis equal
xlabel('1st Principal Component')
ylabel('2nd Principal Component')
grid on
Have some questions:
1) How can I know the weight of PC_1 in my measurments? is it simply first column fo "score", or something else?
2) What's exactly the connection between the output of PCA and SVD? Which case should I compare, standard? with mean subtraction?
3) Am I missing something in the following:
PC1 = alpha_1 * v1 + alpha_2 * v2, right? my alphas are the first column of "coeff" variable, right?
So, the 1st data point projected on PC1 should be: 0.95 * (-4) + 0.28 * (0) = -3.8, right? But it doesn't match score(1,1), which is -4.23... what am I missing here?

Answers (1)

Githin George
Githin George on 4 Oct 2023
Hello Mark,
I understand you have a few doubts related to PCA. To answer your queries:
  1. The "explained"/ "explained_noMean" variable contains the percentage weight of data, captured by each of the Principal Component (PC_1 and PC_2 in this case).
  2. PCA with standardized data yields the same result as doing SVD. I suggest you refer to the following answer know more: https://www.mathworks.com/matlabcentral/answers/774902-pca-vs-svd-or-eig-functions?s_tid=srchtitle_site_search_1_pca%20vs%20svd
  3. The equation "PC1 = alpha_1 * v1 + alpha_2 * v2" gives the projected value of original data point (v1,v2), on the principal axis. But note that "score" is a measure of correlation of data points to the corresponding PC. It does not equal to the projected value in the Principal Component.
I hope this addresses your queries.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!