How to quantify the goodness of a fit?
1 view (last 30 days)
Show older comments
Hello, I need to quantify how much a fit of a PDF (Probability Density Function) is good. I have my data set with its PDF and its fit. I decided to use the function chi2gof. Since it's the first time I'm using it, I decided to run a test script first.
I decided to generate gaussian (one of the easiest PDFs) random variables with mu=3 and calculate the observed PDF. I know for sure which is the correct PDF. The problem is that when I apply the chi2gof function, I obtain the rejection of the null hypothesis (that says that the probability distribution is the same)! I don't understand what I'm doing wrong. I attach my test code:
ntot=1000;
x=randn([ntot,1])+3;
[bins,hist]=my_hist(x);
hist=hist';
my_O=ntot*hist;
gauss=exp(-0.5*(bins-3).^2)/sqrt(2*pi);
my_E=ntot*gauss;
figure
plot(bins,hist); hold on
plot(bins,gauss)
[h,p,stats] = chi2gof(bins,'Ctrs',bins,'Expected',my_E,'Frequency',my_O)
function [bins,hist]=my_hist(input)
input=input(isfinite(input));
h=histogram(input,'Normalization','pdf');
hist=h.Values;
b=h.BinEdges;
bins=NaN.*ones(length(b)-1,1);
norm=0;
for k=1:length(bins)
bins(k)=(b(k)+b(k+1))/2;
norm=norm+(b(k+1)-b(k))*hist(k);
end
%disp(norm)
close
end
Any help is greatly appreciated!
0 Comments
Accepted Answer
the cyclist
on 20 Jan 2023
Edited: the cyclist
on 20 Jan 2023
EDIT: My first posting on this was incomplete, so I radically edited it. Sorry for any confusion if you saw the first version.
I noticed that your expected bin totals my_E do not sum to the value of ntot. The reason for this is that you have mistakenly used gauss as the bin probability, not as the probability density. You need to multiply by the bin width.
You made the same mistake in my_O.
I think you may also be making a mistake in using bin edges where bin centers are expected, but I did not follow up on this.
rng default
ntot=1000;
x=randn([ntot,1])+3;
[bins,hist]=my_hist(x);
hist=hist';
bin_width = bins(2) - bins(1);
my_O=ntot*hist*bin_width;
gauss=exp(-0.5*(bins-3).^2)/sqrt(2*pi);
my_E=ntot*gauss*bin_width;
figure
plot(bins,hist); hold on
plot(bins,gauss)
[h,p,stats] = chi2gof(bins,'Ctrs',bins,'Expected',my_E,'Frequency',my_O)
function [bins,hist]=my_hist(input)
input=input(isfinite(input));
h=histogram(input,'Normalization','pdf');
hist=h.Values;
b=h.BinEdges;
bins=NaN.*ones(length(b)-1,1);
norm=0;
for k=1:length(bins)
bins(k)=(b(k)+b(k+1))/2;
norm=norm+(b(k+1)-b(k))*hist(k);
end
%disp(norm)
close
end
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!