Error in Hosmer Lemeshow GOF test calculation
3 views (last 30 days)
Show older comments
Dear all,
I am trying to calculate the Hosmer and Lemeshow Good-of-Fitness test, following the steps presented in the book "Logistic Regression. A Self- Learning Text" by David Kleinbaum and M. Klein.
The following functions calculates all the steps, gives as outputs the chi-squared, degrees of freedom and the p-value, as well as some interesting plots described in the manual, which look fine to me.
However, I get a completely different result when I run this test in R (holsem.test, from ResourceSelection package).
I was surprised I could not find any code online, so maybe this could be useful in the future. Neither could I see what I am doing wrong. Maybe R uses a very different method?
If you are also keen to find out and see this work, please feel free to join.
Thank you a lot,
% This function performs the Hoslem-Lemeshow Goodness-of-Fit test
% 06-January-2017 davidnielsen@id.uff.br
%
% Inputs
% Q: The number of groups or quantiles (normally Q=10).
% y: 1D array of observations. Must be a series of {0,1}.
% fit: 1D array of the model's fitted values.
% Outputs
% chi2: The Hosmer-Lemeshow chi-square statistics.
% df: Degrees-of-freedom.
% p: p-value
function [chi2,df,p]=hoslem(fit,y,Q)
[ordems,I]=sort(fit,'descend'); % Sorts the fitted values
% Calculates the length of each decile.
% Note that the division of length/10 may not be exact. Therefore, the
% considered series may differ from original inputs in up to 10
% observations, at most. This error shold be to non-significant for very
% long series.
qlen=round(length(y)/Q)-1;
qlens=zeros(Q+1,1);
for i=1:Q+1
qlens(i,1)=1+(i-1)*qlen; % Determines the thresholds of deciles
end
qlims=zeros(Q+1,1); % Positions of thresholds
for i=1:Q+1
qlims(i,1)=ordems(qlens(i,1));
end
plot(1:Q+1,qlims,'o')
% Couts Observed Cases por Decile (Ocq)
ocq=zeros(Q,1); t=1; count=0;
for i=1:max(qlens)-1
if count==qlen
t=t+1; count=0;
end
ocq(t,1)=ocq(t,1)+(y(I(i)));
count=count+1;
end
% Couts Non-Observed Cases por Decile (Oncq)
oncq=zeros(Q,1); t=1; count=0;
for i=1:max(qlens)-1
if count==qlen
t=t+1; count=0;
end
if (y(I(i)))==0
oncq(t,1)=oncq(t,1)+1;
end
count=count+1;
end
figure(1)
plot(ocq); hold on; plot(oncq)
legend('Observed cases','Non-observed cases'); hold off
% Expected cases (Ecq) e noncases (Encq) per Decile
ecq=zeros(Q,1);
t=1; count=0;
for i=1:max(qlens)-1
if count==qlen
t=t+1; count=0;
end
ecq(t,1)=ecq(t,1)+ordems(i);
count=count+1;
end
figure(2)
plot(ecq,ocq,'-o'); hold on;
xlabel('Expected cases'); ylabel('Oobserved cases');
encq=qlen-ecq;
% Calculates HL statistics
chi2= sum(sum(((ocq-ecq).^2)/ecq)) + sum(sum(((oncq-encq).^2)/encq));
df=Q-2;
p=1-chi2cdf(chi2,Q-2);
end
0 Comments
Answers (0)
See Also
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!