Fit a statistical distribution to truncated data
42 views (last 30 days)
Show older comments
I have a "truncated dataset" and I would need to infer the distribution that most likely fits the data. Even though I have a "truncated dataset", instead of a "full dataset", I think that the best fitting distribution would be that one that could describe the "full dataset". This best-fitting distribution would be something like what is depicted by the blue line in this plot:
Do you have any comment, suggestion, or idea on how to get that blue line ?
When I tried to reproduce - with the fitdist function - the blue line in the above-mentioned figure, i.e. the best-fitting distribution as if I had the "full dataset", I was not successful. Here below you can find a comparison between the fitdist applied to the "full dataset" and the "truncated dataset", having both the same "origin", i.e. makedist('Normal','mu',3).
% (1) from a normal probability distribution, i.e. "makedist('Normal','mu',3)",
% create:
% (i) a "full dataset" and
% (ii) a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_full = random(pd,10000,1);
data_trunc = random(t,10000,1);
% (2) fit the normal distribution to
% (i) the "full dataset"
% (ii) the set of "truncated data"
pd_fit_full = fitdist(data_full,'normal');
pd_fit_trunc = fitdist(data_trunc,'normal');
% (3) plot
% (i.a) the "histogram of the full dataset" (from the "full dataset")
% (i.b) the density function corresponding to the distribution that fits the "full dataset"
% (ii.a) the "truncated histogram" (from the "truncated data")
% (ii.b) the density function corresponding to the distribution that fits the "truncated histogram"
xgrid = linspace(0,100,1000)';
hold on
histogram(data_full,100,'Normalization','pdf','facecolor','red')
line(xgrid,pdf(pd_fit_full,xgrid),'Linewidth',2,'color','red')
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,pdf(pd_fit_trunc,xgrid),'Linewidth',2,'color','blue')
hold off
xlim([0 10])
0 Comments
Accepted Answer
Jeff Miller
on 20 Jun 2023
If you would like to fit a variety of truncated distributions in addition to the normal, you might find Cupid helpful. For instance, here's an example with a 2-parameter Weibull:
pd = makedist('Weibull','a',3,'b',5);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
% Lower cutoff of 3 is known. Start with
% any reasonable guesses for the Weibull parameters--here, 2 & 2.
fittedDist = TruncatedXlow(Weibull2(2,2),3);
% Now estimate the Weibull parameters by maximum likelihood,
% allowing for the truncation.
fittedDist.EstML(data_trunc);
xgrid = linspace(0,100,1000)';
figure
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
xlim([2.5 6])
More Answers (2)
Torsten
on 19 Jun 2023
Edited: Torsten
on 19 Jun 2023
Why should it be justified to fit a dataset of a truncated normal by a normal distribution ?
pd_fit_trunc = fitdist(data_trunc,'normal');
First complete the data set "data_trunc" by reflection at x = 3 such that it becomes distributed according to a normal distribution. Then you can fit it by a normal distribution:
% (1) from a normal probability distribution, i.e. "makedist('Normal','mu',3)",
% create:
% (i) a "full dataset" and
% (ii) a set of "truncated data"
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_full = random(pd,10000,1);
data_trunc = random(t,10000,1);
data_trunc = [data_trunc;-(data_trunc-3)+3];
% (2) fit the normal distribution to
% (i) the "full dataset"
% (ii) the set of "truncated data"
pd_fit_full = fitdist(data_full,'normal');
pd_fit_trunc = fitdist(data_trunc,'normal');
% (3) plot
% (i.a) the "histogram of the full dataset" (from the "full dataset")
% (i.b) the density function corresponding to the distribution that fits the "full dataset"
% (ii.a) the "truncated histogram" (from the "truncated data")
% (ii.b) the density function corresponding to the distribution that fits the "truncated histogram"
xgrid = linspace(0,100,1000)';
hold on
histogram(data_full,100,'Normalization','pdf','facecolor','red')
line(xgrid,pdf(pd_fit_full,xgrid),'Linewidth',2,'color','red')
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,pdf(pd_fit_trunc,xgrid),'Linewidth',2,'color','blue')
hold off
xlim([0 10])
the cyclist
on 19 Jun 2023
pd = makedist('Normal','mu',3);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
[norm_trunc, phat, phat_ci] = fitdist_ntrunc(data_trunc, [3, Inf]);
xgrid = linspace(0,100,1000)';
figure
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,norm_trunc(xgrid,phat(1),phat(2)),'Linewidth',2,'color','red')
xlim([0 10])
See Also
Categories
Find more on Probability Distributions in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!