I was puzzled by the output of pca() when using mean centering or not. I am using Matlab 2024a.
pca.m uses the internal function c = ncnancov(x,Rows,centered) which seems to provide the covariance matrix of x
however,
1) it uses the formula for the population covariance, i.e. it calculates x'*x/n not x'*x/(n-1) - what is the rationale behind that?
2) it does not mean center x. This is surprising because without mean centering x the formula x'*x/n (or x'*x/(n-1) for that matter) does NOT provide the covariance matrix
The second point causes the call [coeff,score,latent]=pca(D, 'Algorithm','eig’,'Centered','off') to produce different coeff, and latent from the call [coeff,score,latent]=pca(D, 'Algorithm','eig’). The scores will obviosuly be different but coeff and latent should not be affected by mean centering as can be shown by comparing the output of:
load('Data_Table8p1.mat');
[coeff,eigValues] = eig(cov(D));
[eigValues, idx] = sort(diag(eigValues), 'descend');
with:
[coeff_m,eigValues_m] = eig(cov(Dm));
[eigValues_m, idx] = sort(diag(eigValues_m), 'descend');
coeff_m = coeff_m(:, idx);
Probably I am missing something, but the internal function ncnancov() as used in pca is unclear to me. Any explanation is much appreciated!