Calculate confidence interval of data set

172 views (last 30 days)
Katara So
Katara So on 17 Dec 2020
Edited: Ive J on 20 Dec 2020
I want to calculate a 95% confidence interval for the difference between the mean of each variable for the species versicolor and virginica in the "fisheriris" data set, but I don't quite understand how to write the code. This is my code so far:
load fisheriris
%Versicolor
ver1 = meas(51:100,1);
ver2 = meas(51:100,2);
ver3 = meas(51:100,3);
ver4 = meas(51:100,4);
ver1_mean = mean(ver1);
ver2_mean = mean(ver2);
ver3_mean = mean(ver3);
ver4_mean = mean(ver4);
%Virginica
vir1 = meas(101:150,1);
vir2 = meas(101:150,2);
vir3 = meas(101:150,3);
vir4 = meas(101:150,4);
vir1_mean = mean(vir1);
vir2_mean = mean(vir2);
vir3_mean = mean(vir3);
vir4_mean = mean(vir4);
q1 = quantile(ver1_mean - vir1_mean, [0.95])
q2 = quantile(ver2_mean - vir2_mean, [0.95])
q3 = quantile(ver3_mean - vir3_mean, [0.95])
q4 = quantile(ver4_mean - vir4_mean, [0.95])
But I assume that I should have some sort of interval instead of just 0.95 in my quantile command, however whatever I put for ex, [0.05 0.95], I get the same value in for both ends of the interval and if I just have 0.95 then I don't get an interval. Could someone explain where I have gone wrong, all help is appreciated!

Answers (1)

Ive J
Ive J on 18 Dec 2020
Edited: Ive J on 18 Dec 2020
I'm not sure if you want to calculate CI or something like reference range. Regardless, as you can see on quantile help page, it says Quantiles of a data set. So, of course when your inputs are scalar the output is same as the input.
data = randn(100, 1);
quantile(data, [0.05 0.95]) % this is not 95% CI
% 95% CI
CI95 = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), mean(A) + 1.96*(std(A)./sqrt(numel(A)))]
  2 Comments
Katara So
Katara So on 18 Dec 2020
I want to calculate CI, however I am having trouble getting the mean values of the variables. In another problem where I had to use bootstrp to calculate CI, I did like this:
%Versicolor
ver1 = meas(51:100,1);
%Virginica
vir1 = meas(101:150,1);
M = 2000;
alpha = 0.05/4;
vers_boot1=bootstrp(M,@mean,ver1);
vir_boot1=bootstrp(M,@mean,vir1);
ci1 = quantile(vers_boot1-vir_boot1, [alpha 1-alpha])
which gave me an that ci1 = [-0.9170 -0.3960].
So now I want to do the same but without using bootstrp and for 95% CI. What I don't understand is how to get a vector of the mean values for the two different variables. With the code in my original question I get that the mean is only one value while in the problem posted above I get it to be a vector.
Ive J
Ive J on 20 Dec 2020
Edited: Ive J on 20 Dec 2020
In bootsrapt you simply do random sampling form ver1 and vir1; so, in above example you pick K random subsamples with replication from ver1 and vir1 and calculate the mean of each subsample. So, you have a vector of mean values, for which you can easily calculate CI.
I guess you are looking for something like this:
load fisheriris
%Versicolor
ver = meas(51:100,1);
%Virginica
vir = meas(101:150,1);
% 95% CI using bootstrapping
M = 2000;
alpha = 0.05/2;
ver_boot = bootstrp(M, @mean, ver);
vir_boot = bootstrp(M, @mean, vir);
ci_bootstrap = quantile(ver_boot - vir_boot, [alpha 1-alpha]);
% using Z-value (assuming normal distribution)
A = ver - vir;
ci_z = [mean(A) - 1.96*(std(A)./sqrt(numel(A))), ...
mean(A) + 1.96*(std(A)./sqrt(numel(A)))];
% using t-distribution
[~, ~, ci_t] = ttest2(ver, vir);
% compare them
ci_bootstrap
ci_z
ci_t'
ci_bootstrap =
-0.8880 -0.4230
ci_z =
-0.8942 -0.4098
ci_t =
-0.8819 -0.4221
% visualize esitmated mean from bootstrapping
histogram(ver_boot - vir_boot)
line([mean(A), mean(A)], get(gca, 'YLim'), 'color', 'r', 'LineWidth', 1.5)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!