How to quantify the goodness of a fit?

1 view (last 30 days)
Hello, I need to quantify how much a fit of a PDF (Probability Density Function) is good. I have my data set with its PDF and its fit. I decided to use the function chi2gof. Since it's the first time I'm using it, I decided to run a test script first.
I decided to generate gaussian (one of the easiest PDFs) random variables with mu=3 and calculate the observed PDF. I know for sure which is the correct PDF. The problem is that when I apply the chi2gof function, I obtain the rejection of the null hypothesis (that says that the probability distribution is the same)! I don't understand what I'm doing wrong. I attach my test code:
ntot=1000;
x=randn([ntot,1])+3;
[bins,hist]=my_hist(x);
hist=hist';
my_O=ntot*hist;
gauss=exp(-0.5*(bins-3).^2)/sqrt(2*pi);
my_E=ntot*gauss;
figure
plot(bins,hist); hold on
plot(bins,gauss)
[h,p,stats] = chi2gof(bins,'Ctrs',bins,'Expected',my_E,'Frequency',my_O)
function [bins,hist]=my_hist(input)
input=input(isfinite(input));
h=histogram(input,'Normalization','pdf');
hist=h.Values;
b=h.BinEdges;
bins=NaN.*ones(length(b)-1,1);
norm=0;
for k=1:length(bins)
bins(k)=(b(k)+b(k+1))/2;
norm=norm+(b(k+1)-b(k))*hist(k);
end
%disp(norm)
close
end
Any help is greatly appreciated!

Accepted Answer

the cyclist
the cyclist on 20 Jan 2023
Edited: the cyclist on 20 Jan 2023
EDIT: My first posting on this was incomplete, so I radically edited it. Sorry for any confusion if you saw the first version.
I noticed that your expected bin totals my_E do not sum to the value of ntot. The reason for this is that you have mistakenly used gauss as the bin probability, not as the probability density. You need to multiply by the bin width.
You made the same mistake in my_O.
I think you may also be making a mistake in using bin edges where bin centers are expected, but I did not follow up on this.
rng default
ntot=1000;
x=randn([ntot,1])+3;
[bins,hist]=my_hist(x);
hist=hist';
bin_width = bins(2) - bins(1);
my_O=ntot*hist*bin_width;
gauss=exp(-0.5*(bins-3).^2)/sqrt(2*pi);
my_E=ntot*gauss*bin_width;
figure
plot(bins,hist); hold on
plot(bins,gauss)
[h,p,stats] = chi2gof(bins,'Ctrs',bins,'Expected',my_E,'Frequency',my_O)
h = 0
p = 0.2150
stats = struct with fields:
chi2stat: 21.2561 df: 17 edges: [-0.3000 0.6000 0.9000 1.2000 1.5000 1.8000 2.1000 2.4000 2.7000 3.0000 3.3000 3.6000 3.9000 4.2000 4.5000 4.8000 5.1000 5.4000 6.6000] O: [6 15.0000 17.0000 27 53.0000 79 85.0000 101.0000 127.0000 110.0000 124.0000 95.0000 67.0000 32.0000 27.0000 16.0000 6.0000 13.0000] E: [7.5349 9.5219 17.8784 30.6795 48.1150 68.9646 90.3412 108.1581 118.3438 118.3438 108.1581 90.3412 68.9646 48.1150 30.6795 17.8784 9.5219 7.8464]
function [bins,hist]=my_hist(input)
input=input(isfinite(input));
h=histogram(input,'Normalization','pdf');
hist=h.Values;
b=h.BinEdges;
bins=NaN.*ones(length(b)-1,1);
norm=0;
for k=1:length(bins)
bins(k)=(b(k)+b(k+1))/2;
norm=norm+(b(k+1)-b(k))*hist(k);
end
%disp(norm)
close
end

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!