# Given a null distribution, how can I calculate a p-value for my test statistic?

5 views (last 30 days)
Prabhjot Dhami on 3 Feb 2022
Commented: Jeff Miller on 4 Feb 2022
Greetings,
For example, let's say I have two groups and want to see if their means are significantly different. However, I want to do so in a shuffling/permutation framework.
Accordingly, I shuffle the group labels across data points, calculate the difference between means, and do so 5000 times to create a null distribution.
I have my original unshuffled mean difference, and see that it is in the top 2.5 percentile of the null distribution. I can thus conclude that the difference is significant at the two-tailed level.
However, in this context, how can I compute the exact p-value of my original mean difference value with the null distribution? I am lost when it comes to finding the right function to do so.
Thank you.
P.

Jeff Miller on 3 Feb 2022
The one-tailed p value is just the tail probability of your original unshuffled difference relative to the null distribution that you created by shuffling. In your example where the unshuffled mean is at the edge of the top 2.5% of the null distribution, p=.025.
For two-tailed testing, the p would be double this tail probability (e.g., 2*.025=.05)
##### 2 CommentsShowHide 1 older comment
Jeff Miller on 4 Feb 2022
If Obs is the observed mean difference and SN is a vector of differences from the shuffled data, you could compute for example
pLess = mean(SN<Obs);
pGreater = mean(SN>Obs);
to get the exact probability of null values less than or greater than your observed value. The mean of the many 0's and 1's will be the proportion that you are interested in.