ttest and confidence interval

Question

petros bomb on 18 Nov 2021

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1589984-ttest-and-confidence-interval

Edited: Adam Danz on 22 Nov 2021

ttest returns the confidence interval for some (1-a) probability.

Shouldn't that interval be exacly the same interval with [-tc*s/sqrt(n)+μ,tc*s/sqrt(n)+μ]?

where tc is the critical t value for 1-a probability, s the standard derivation of sampe (s=sqrt(var(Table)/(n-1)), μ the theoritical mean value and n the number of elements?

Thank you for your time.

There is the code also:

n=30;
tau=10;
ValuesTable=exprnd(tau,n,1);%30 values from exponential with mean=tau=10
%mean and standard derivation of sample
m=mean(ValuesTable);
s=0;
for i=1:n
    s=s+(ValuesTable(i)-m)^2;
end
s=sqrt(s/(n-1));%standard derivation
%tcritical for a=0.05 and n=29 degrees of freedom
%tc(degrees of freedom=29 and 1-a/2=1-0.05/2=0.975)
tc=2.045;
%interval from equation
c=[-tc*s/sqrt(n)+tau,+tc*s/sqrt(n)+tau];
%interval from ttest
[~,~,d,~]=ttest(ValuesTable,tau,'Alpha',0.05);
fprintf("Interval1: [%f,%f] \nInterval2: [%f,%f] \n",c(1),c(2),d(1),d(2));
Interval1: [6.401967,13.598033] 
Interval2: [7.313208,14.510082] 

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Adam Danz on 18 Nov 2021

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1589984-ttest-and-confidence-interval#answer_835044

Edited: Adam Danz on 18 Nov 2021

Open in MATLAB Online

I assume you're intentionally avoiding the std() function. However, your calculation of standard deviation is incorrect.

s=sqrt(s)/(n-1);

should be

s=sqrt(s/(n-1));

Secondly, your critical value is only correct for a standard normal distribution. Matlab's calculation of the critical value correctly multiplies by the standard error using

crit = tinv(1 - alpha, df) .* ser;

Lastly and most importantly, the confidence interval computed with this method for your data is meaningless or, even worse, misleading. This method of computing a CI assumes a normal distribution and your data are clearly nowhere close to being normally distributed.

I recommend using bootstrap confidence intervals which do not carry a distribution assumption.

This demo estimates the median (since the mean is heavily influenced by the tails in non-normal distributions). It computes the median 1000s times with bootstrapped samples and returns the middle 95% of the distribution of medians thanks to the central limit theorem.

% ValuesTable computed using rng(999) for reproducibility
ci = bootci(1000, {@median, ValuesTable}, 'type', 'per', 'alpha', .05);
figure()
histogram(ValuesTable,20)
h(1) = xline(ci(1), 'r-', 'LineWidth', 2, 'DisplayName','LowerCI');
h(2) = xline(ci(2), 'm-', 'LineWidth', 2, 'DisplayName', 'UpperCI');
h(3) = xline(median(ValuesTable), 'k--', 'DisplayName','median');
legend(h)

Lastly, let's compare the 95% CIs beformed by ttest and by bootstrapping on your data. The black dashed line is the mean of the population. The red lines are the 95% CI computed by bootstrapping. The dashed black lines are 95% CI computed by ttest. The ttest CIs are similar to the bootstrap CIs but shifted leftward. Since the ttest CIs are computed using std and since std are affected by outliers which appear as a rightward tail, the ttest CIs are not as reliable as the bootstrap results. In fact, the bootstrap results using the percentile method will always be either as reliable (in the case of normally distributed data) or more reliable (in all other cases) than using methods that require normal distributions.

4 Comments
Show 2 older commentsHide 2 older comments

petros bomb on 18 Nov 2021

Edited: petros bomb on 18 Nov 2021

Hello my friend.

I really appreciate what u told me. You've been very helpfull to me.

I did a mistake there, i meant to write s=sqrt(s/(n-1)) indeed.

So if i did understand correcly, you are saying that the ttest assumes that my data comes from a normal distribution, or better from a standard normal distribution (mean=0, s^2=1).

Let's assume that i've got 1000 samples each one with 100 observations (from the same distribution i.e. exponential). If i compute the 1000 mean values of those 1000 samples, then i will have a new variable (lets say its M) and this variable will have as observations those 1000 mean values. If we make the boxplot and hist of those 1000 M values we can tell that M follows almost standard normal distribution, or to be exact M follow t-student distribution with 999 degrees of freedom. Because of high degrees of freedom t-student behaves like standard normal distribution, so i can use ttest on M.

So this was my question, ttest interval wont be the exact same as the interval comes from the t-student.

And this is because the ttest assumes standard normal distribution. Right?

So when the samples -> infinity then those intervals will be the same?

Also i really want to thank you for suggesting me the bootstrap. I never heard of it and it seems really good.

Thanks again my friend

Adam Danz on 18 Nov 2021

Open in MATLAB Online

Happy to help.

>So if i did understand correcly, you are saying that the ttest assumes that my data comes from a normal distribution, or better from a standard normal distribution (mean=0, s^2=1).

T-tests assume

All observations are indpendently samples from the population
Samples are approximately normally distributed
Data are continuous (no discrete, categorical, etc)

The data should be normally distributed but they do not have to tbe standard-normal.

See this chapter PDF for everyrthing you need to know about 1-sample t-tests.

> Let's assume that i've got 1000 samples...

I didn't exactly follow this example but I think what you're describing is the Central Limit Theorem (CLT). To briefly explain the CLM, assume you have 1000 random samples of measured data. If you randomly sample 100 (or 200, 50, whatever) samples from the 1000 values and compute the mean and then repeate that 1000 times, you'll have 1000 means from 1000 randome sub-samples. The CLT states that the distribution of the mean values will form a normal distribution even if the underlying population do not form a normal distribution.

Demo

n = 1000;

x = randg(1,1,n);

figure()

ax1 = subplot(1,2,1);

histogram(ax1,x,20)

title(ax1,'raw data')

subtitle(ax1,sprintf('%d samples',n))

% 100 bootstraps

nBoot = 5000;

sampSize = 200;

meanVals = nan(1,nBoot);

for i = 1:nBoot

rsamp = randsample(x,sampSize);

meanVals(i) = mean(rsamp);

end

ax2 = subplot(1,2,2);

histogram(ax2,meanVals,20)

title(ax2,'bootstrap means')

subtitle(ax2,sprintf('%d bootstraps',nBoot))

% Show mean of bootstrap means in both axes

meanbs = mean(meanVals)

meanbs = 0.9894

xline(ax1, meanbs, 'k--', 'LineWidth', 2)

xline(ax2, meanbs, 'k--', 'LineWidth', 2)

% Include mean of population

xline(ax1, mean(x), 'k-', 'lineWidth',1)

% compute 95%CI

p = 95;

CI = prctile(meanVals, [(100-p)/2, p+(100-p)/2]) % 2-tailed 95% CI

CI = 1×2

0.8678 1.1128

% Show CI on both plots

xline(ax1, CI(1), 'm-', 'LineWidth', 1)

xline(ax1, CI(2), 'm-', 'LineWidth', 1)

xline(ax2, CI(1), 'm-', 'LineWidth', 1)

xline(ax2, CI(2), 'm-', 'LineWidth', 1)

sgtitle('Central Limit Theorem Demo')

> ttest interval wont be the exact same as the interval comes from the t-student And this is because the ttest assumes standard normal distribution. Right?

Not exactly. All t-tests including Matlab's ttest assume a normal distribution. Standard normal distributions are a subset of normal distributions in that they specifically have a mean of 0 and std of 1. The formula you used to compute the critical value assumes a standard normal distribution but to allow for normal distributions that are not standard-normal, Matlab's ttest includes the standard error in the critical value computation.

Petros Petridis on 19 Nov 2021

Open in MATLAB Online

Good evening sir,

finally i figured out what i was doing wrong.

In my code these lines:

%interval from equation
c=[-tc*s/sqrt(n)+tau,+tc*s/sqrt(n)+tau];

instead of tau, i should have written mean(ValuesTables).

We try to find the tau (=real mean value) so we assume

t=(mean(table)-RealMean)/(s/sqrt(n)), so if n>30 we can tell t~Student(0,(s^2)/n).

We solve this for RealMean so we can find the confidence intervals =>

RealMean (Interval)= +/-t*s/sqrt(n) +mean(ValuesTable).

This way ttest() returns the same interval with this way.

I was doing this wrong over 20 times the last days.

That pdf really helped me.

Adam Danz on 19 Nov 2021

Edited: Adam Danz on 22 Nov 2021

Well done figuring that out.

Sign in to comment.

ttest and confidence interval

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

4 Comments
Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Community Treasure Hunt

ttest and confidence interval

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

4 Comments Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments