Goodness-of-Fit for a best-fitting distribution? (by using CUPID)

8 views (last 30 days)
By using CUPID, how to assess the Goodness-of-Fit for this best-fitting distribution?
addpath('.../Cupid-master')
pd = makedist('Weibull','a',3,'b',5);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
% Lower cutoff of 3 is known. Start with
% any reasonable guesses for the Weibull parameters--here, 2 & 2.
fittedDist = TruncatedXlow(Weibull2(2,2),3);
% Now estimate the Weibull parameters by maximum likelihood,
% allowing for the truncation.
fittedDist.EstML(data_trunc);
xgrid = linspace(0,100,1000)';
figure
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
xlim([2.5 6])

Accepted Answer

Jeff Miller
Jeff Miller on 20 Jun 2023
There are lots of different ways to evaluate the goodness of fit of a given theoretical distribution to a dataset (see, e.g., wikipedia
Cupid has three main "built in" measures, using a chi-square test, a kolmogorov-smirnov test, or simple likelihood of the observations. See section 4.1.5 of the documentation.
  5 Comments
Jeff Miller
Jeff Miller on 21 Jun 2023
Edited: Jeff Miller on 21 Jun 2023
  1. The log likelihood value is just a number that reflects (to some extent) the fit of the distribution to the data. It cannot be used for hypothesis testing (i.e., you cannot use it to decide whether to reject H0). Instead, it can be used to compare the fits of different distributions--whichever distribution gives the highest likelihood value provides the best fit (in one sense of gof). There was no problem with the likelihood computation in your example; that -256.24 is just the computed value. It really means very little by itself but could be useful when compared with the likelihood values computed for other distributions that you are considering.
  2. The error you got was in the kstest, which is the third, separate gof measure. The error was that matlab was not able to find the file KolmoJava1_7Class.jar, which should be in the same folder as the file KolmSmir.m. Can you confirm that this jar file is actually in that folder? Here I suspect the error is that I used the Windows path separator '\' in the javaaddpath command inside KolmSmir.m. Change that to filesep as follows and I guess it will work. I will update the file on GitHub.
javaaddpath([mypath filesep obj.sJarFileName]);
3. When you do get kstest running, just check the p value and reject if p<.05. This is a hypothesis testing procedure like the chi-square test, so you can use it for deciding whether to reject H0.
4. If you have more questions about Cupid, it might be better to raise them as issues on GitHub rather than using the MATLAB Answers forum, since these questions are probably too specific to be of general interest.
Sim
Sim on 22 Jun 2023
thanks a lot @Jeff Miller!!
Yes, you were right, I opened the KolmSmir.m file, and I replaced
javaaddpath([mypath '\' obj.sJarFileName]);
by
javaaddpath([mypath filesep obj.sJarFileName]);
Then, I run the same code, that worked well, without errors or warnings:
addpath('.../Cupid-master')
% (1) create a "truncated dataset"
pd = makedist('Weibull','a',3,'b',5);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution (in this case the "Weibull2") to the "truncated test"
fittedDist = TruncatedXlow(Weibull2(2,2),3);
% (3) estimate the Weibull parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
% (in this case the "Weibull2" with Weibull's parameters estimated by maximum likelihood)
figure
xgrid = linspace(0,100,1000)';
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
xlim([2.5 6])
% (5) 4.1.5 Goodness-of-fit measures (χ2 and log likelihood)
%
% (5.1.1) "GofFChiSq" test (<-- part suggested in this Matlab Answer)
[BinProbs, BinUpperBounds] = histcounts(data_trunc, 'Normalization','probability');
observedChisqTestValue = fittedDist.GofFChiSq(BinUpperBounds(2:end-1),BinProbs(1:end-1));
% (5.1.2) "GofFChiSq" test (<-- part found in the documentation)
HoChiSq = ChiSq(numel(BinProbs)-1);
critChiSq = HoChiSq.InverseCDF(.95);
if observedChisqTestValue > critChiSq
disp('GofFChiSq: Reject Ho') % the data did not fit the model, which means you have to "REJECT the null hypothesis"
else
disp('GofFChiSq: Do not reject Ho') % the data did fit the model, you "FAIL TO REJECT the null hypothesis",
end
%
% (5.2) "LnLikelihood" test (<-- part suggested in this Matlab Answer)
likelihood_value = fittedDist.LnLikelihood(data_trunc)
[p, Dmax] = fittedDist.kstest(data_trunc)
Output in the Command Window:
GofFChiSq: Do not reject Ho
likelihood_value =
-110.31
p =
0.85519
Dmax =
0.0060512
>>

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!