ttest and confidence interval
Show older comments
ttest returns the confidence interval for some (1-a) probability.
Shouldn't that interval be exacly the same interval with [-tc*s/sqrt(n)+μ,tc*s/sqrt(n)+μ]?
where tc is the critical t value for 1-a probability, s the standard derivation of sampe (s=sqrt(var(Table)/(n-1)), μ the theoritical mean value and n the number of elements?
Thank you for your time.
There is the code also:
n=30;
tau=10;
ValuesTable=exprnd(tau,n,1);%30 values from exponential with mean=tau=10
%mean and standard derivation of sample
m=mean(ValuesTable);
s=0;
for i=1:n
s=s+(ValuesTable(i)-m)^2;
end
s=sqrt(s/(n-1));%standard derivation
%tcritical for a=0.05 and n=29 degrees of freedom
%tc(degrees of freedom=29 and 1-a/2=1-0.05/2=0.975)
tc=2.045;
%interval from equation
c=[-tc*s/sqrt(n)+tau,+tc*s/sqrt(n)+tau];
%interval from ttest
[~,~,d,~]=ttest(ValuesTable,tau,'Alpha',0.05);
fprintf("Interval1: [%f,%f] \nInterval2: [%f,%f] \n",c(1),c(2),d(1),d(2));
Answers (1)
I assume you're intentionally avoiding the std() function. However, your calculation of standard deviation is incorrect.
s=sqrt(s)/(n-1);
should be
s=sqrt(s/(n-1));
Secondly, your critical value is only correct for a standard normal distribution. Matlab's calculation of the critical value correctly multiplies by the standard error using
crit = tinv(1 - alpha, df) .* ser;
Lastly and most importantly, the confidence interval computed with this method for your data is meaningless or, even worse, misleading. This method of computing a CI assumes a normal distribution and your data are clearly nowhere close to being normally distributed.
I recommend using bootstrap confidence intervals which do not carry a distribution assumption.
This demo estimates the median (since the mean is heavily influenced by the tails in non-normal distributions). It computes the median 1000s times with bootstrapped samples and returns the middle 95% of the distribution of medians thanks to the central limit theorem.
% ValuesTable computed using rng(999) for reproducibility
ci = bootci(1000, {@median, ValuesTable}, 'type', 'per', 'alpha', .05);
figure()
histogram(ValuesTable,20)
h(1) = xline(ci(1), 'r-', 'LineWidth', 2, 'DisplayName','LowerCI');
h(2) = xline(ci(2), 'm-', 'LineWidth', 2, 'DisplayName', 'UpperCI');
h(3) = xline(median(ValuesTable), 'k--', 'DisplayName','median');
legend(h)

Lastly, let's compare the 95% CIs beformed by ttest and by bootstrapping on your data. The black dashed line is the mean of the population. The red lines are the 95% CI computed by bootstrapping. The dashed black lines are 95% CI computed by ttest. The ttest CIs are similar to the bootstrap CIs but shifted leftward. Since the ttest CIs are computed using std and since std are affected by outliers which appear as a rightward tail, the ttest CIs are not as reliable as the bootstrap results. In fact, the bootstrap results using the percentile method will always be either as reliable (in the case of normally distributed data) or more reliable (in all other cases) than using methods that require normal distributions.

4 Comments
petros bomb
on 18 Nov 2021
Edited: petros bomb
on 18 Nov 2021
Happy to help.
>So if i did understand correcly, you are saying that the ttest assumes that my data comes from a normal distribution, or better from a standard normal distribution (mean=0, s^2=1).
T-tests assume
- All observations are indpendently samples from the population
- Samples are approximately normally distributed
- Data are continuous (no discrete, categorical, etc)
The data should be normally distributed but they do not have to tbe standard-normal.
> Let's assume that i've got 1000 samples...
I didn't exactly follow this example but I think what you're describing is the Central Limit Theorem (CLT). To briefly explain the CLM, assume you have 1000 random samples of measured data. If you randomly sample 100 (or 200, 50, whatever) samples from the 1000 values and compute the mean and then repeate that 1000 times, you'll have 1000 means from 1000 randome sub-samples. The CLT states that the distribution of the mean values will form a normal distribution even if the underlying population do not form a normal distribution.
Demo
n = 1000;
x = randg(1,1,n);
figure()
ax1 = subplot(1,2,1);
histogram(ax1,x,20)
title(ax1,'raw data')
subtitle(ax1,sprintf('%d samples',n))
% 100 bootstraps
nBoot = 5000;
sampSize = 200;
meanVals = nan(1,nBoot);
for i = 1:nBoot
rsamp = randsample(x,sampSize);
meanVals(i) = mean(rsamp);
end
ax2 = subplot(1,2,2);
histogram(ax2,meanVals,20)
title(ax2,'bootstrap means')
subtitle(ax2,sprintf('%d bootstraps',nBoot))
% Show mean of bootstrap means in both axes
meanbs = mean(meanVals)
xline(ax1, meanbs, 'k--', 'LineWidth', 2)
xline(ax2, meanbs, 'k--', 'LineWidth', 2)
% Include mean of population
xline(ax1, mean(x), 'k-', 'lineWidth',1)
% compute 95%CI
p = 95;
CI = prctile(meanVals, [(100-p)/2, p+(100-p)/2]) % 2-tailed 95% CI
% Show CI on both plots
xline(ax1, CI(1), 'm-', 'LineWidth', 1)
xline(ax1, CI(2), 'm-', 'LineWidth', 1)
xline(ax2, CI(1), 'm-', 'LineWidth', 1)
xline(ax2, CI(2), 'm-', 'LineWidth', 1)
sgtitle('Central Limit Theorem Demo')
> ttest interval wont be the exact same as the interval comes from the t-student And this is because the ttest assumes standard normal distribution. Right?
Not exactly. All t-tests including Matlab's ttest assume a normal distribution. Standard normal distributions are a subset of normal distributions in that they specifically have a mean of 0 and std of 1. The formula you used to compute the critical value assumes a standard normal distribution but to allow for normal distributions that are not standard-normal, Matlab's ttest includes the standard error in the critical value computation.
Petros Petridis
on 19 Nov 2021
Good evening sir,
finally i figured out what i was doing wrong.
In my code these lines:
%interval from equation
c=[-tc*s/sqrt(n)+tau,+tc*s/sqrt(n)+tau];
instead of tau, i should have written mean(ValuesTables).
We try to find the tau (=real mean value) so we assume
t=(mean(table)-RealMean)/(s/sqrt(n)), so if n>30 we can tell t~Student(0,(s^2)/n).
We solve this for RealMean so we can find the confidence intervals =>
RealMean (Interval)= +/-t*s/sqrt(n) +mean(ValuesTable).
This way ttest() returns the same interval with this way.
I was doing this wrong over 20 times the last days.
That pdf really helped me.
Categories
Find more on Noncentral t Distribution in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!