How to calculate correlation p-value?

57 views (last 30 days)
Ritankar Das
Ritankar Das on 26 May 2016
Commented: Gregory Pelletier on 24 Jan 2024
I have a correlation matrix and have performed some filtering on it. Now I want to calculate the p-value of the filtered correlation matrix. Can anyone help we with the code. [R,P]=corrcoef(A) returns both the correlation matrix as well as the p-value matrix, but I already have the correlation matrix, and just want to calculate the p-value matrix.
Thank you in advance. Ritankar.

Answers (3)

Gregory Pelletier
Gregory Pelletier on 24 Jan 2024
Here is how to calculate the p-values the same way that matlab does in corrcoef if you only know the correlation coefficient matrix R and the number of samples N (see p_check below for the manual calculation of the p-value compared with p from corrcoef):
load hospital
X = [hospital.Weight hospital.BloodPressure];
[R, p] = corrcoef(X)
N = size(X,1);
t = sqrt(N-2).*R./sqrt(1-R.^2);
s = tcdf(t,N-2);
p_check = 2 * min(s,1-s)
% R =
% 1.0000e+00 1.5579e-01 2.2269e-01
% 1.5579e-01 1.0000e+00 5.1184e-01
% 2.2269e-01 5.1184e-01 1.0000e+00
% p =
% 1.0000e+00 1.2168e-01 2.5953e-02
% 1.2168e-01 1.0000e+00 5.2460e-08
% 2.5953e-02 5.2460e-08 1.0000e+00
% p_check =
% 0 1.2168e-01 2.5953e-02
% 1.2168e-01 0 5.2460e-08
% 2.5953e-02 5.2460e-08 0

the cyclist
the cyclist on 26 May 2016
You cannot calculate a P-value from only a correlation matrix. You need the underlying data. The reason why is pretty easy to understand ... The correlation matrix could have come from a dataset with maybe N=10 measurements, or perhaps N=100000 measurements. These will (almost certainly) have different P-values.
  1 Comment
Gregory Pelletier
Gregory Pelletier on 24 Jan 2024
Here is how to calculate the p-values the same way that matlab does in corrcoef if you only know the correlation coefficient matrix R and the number of samples N (see p_check below for the manual calculation of the p-value compared with p from corrcoef):
load hospital
X = [hospital.Weight hospital.BloodPressure];
[R, p] = corrcoef(X)
N = size(X,1);
t = sqrt(N-2).*R./sqrt(1-R.^2);
s = tcdf(t,N-2);
p_check = 2 * min(s,1-s)
% R =
% 1.0000e+00 1.5579e-01 2.2269e-01
% 1.5579e-01 1.0000e+00 5.1184e-01
% 2.2269e-01 5.1184e-01 1.0000e+00
% p =
% 1.0000e+00 1.2168e-01 2.5953e-02
% 1.2168e-01 1.0000e+00 5.2460e-08
% 2.5953e-02 5.2460e-08 1.0000e+00
% p_check =
% 0 1.2168e-01 2.5953e-02
% 1.2168e-01 0 5.2460e-08
% 2.5953e-02 5.2460e-08 0

Sign in to comment.


Anil Kamat
Anil Kamat on 30 May 2021
Edited: Anil Kamat on 30 May 2021
Lets say
N --> no.of the observations / data points
r --> assumed corr.coef
t = r*sqrt((N-2)/(1-r^2)); % find t-statistics
p1 = 1 - tcdf(t,(N-2)) % find pvalue using Student's t cumulative distribution function for one sample test.
https://www.mathworks.com/help/stats/tcdf.html
  1 Comment
the cyclist
the cyclist on 1 Jun 2021
Edited: the cyclist on 1 Jun 2021
Can you help me understand that your formula is correct? Here is a correlation coefficient calculated from a randome dataset, and then your calculation. (I modified your formula only to make meaningful variable names.)
rng default
N = 1000;
x = randn(N,2);
[correlationMatrix,pMatrix] = corrcoef(x);
pValueFromOriginalData = pMatrix(1,2);
correlationCoefficient = correlationMatrix(1,2);
t = correlationCoefficient*sqrt((N-2)/(1-correlationCoefficient^2)); % find t-statistics
p_from_anil = 1 - tcdf(t,(N-2)); % find pvalue using Student's t cumulative distribution function for one sample test
sprintf('p-value from original data = %7.4f',pValueFromOriginalData)
ans = 'p-value from original data = 0.7695'
sprintf('p-value from Anil = %7.4f',p_from_anil)
ans = 'p-value from Anil = 0.6153'
They give different results.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!