How can I know which ditribution is appropriate to fit on the generated histogram? and, How can I do that?
    6 views (last 30 days)
  
       Show older comments
    
Hi all,
Using the code below, I create a histogram of soil pors' diameter. My question is that how I can recognize which ditribution can fits to that very well and, how I can handle it in MATLAB.
clc
clear
close all
load("Diameter.mat");
load("Number_Pores.mat")
Diameter=flip(Diameter)';
N=flip(N)';
% use diameter vector as bin edges and convert it to mm, since pors are in
% mm
bin_edge = Diameter*1e-6;
% Corresponding frequencies for each bin
frequencies = N;
% Create the histogram with data and frequencies
histogram('BinEdges', bin_edge, 'BinCounts', frequencies,'Normalization','pdf');
% Customize the plot (optional)
title('MIP histogram');
xlabel('Pore Diameter (mm)');
ylabel('Frequency');
grid on;
set(gca,"XScale","log")
4 Comments
  Star Strider
      
      
 on 3 Oct 2023
				Since the pore sizes cannot be negative, that would limit the distribution to continuous distributions with positive support.  Fitting a lognormal distribution to it would likely be where to start, then see if other logical choices work better.  (I do not know what processes govern pore sizes, however that could guide the correct distribution choice if known.)  
Answers (3)
  the cyclist
      
      
 on 3 Oct 2023
        Fitting to the histogram of data, instead of to the raw data, is typically a bad modeling practice, because you introduce error during the binning process.
You could use the ksdensity function to generate a non-parametric curve that fits your data. (Be sure to use the option that limits statistical support to positive values.)
It might also be fine to just re-sample the data you have, to generate new data. (That will never generate unseen pore sizes, but maybe that error is not important.)
I frankly don't understand the utility of the method you describe, of
- Fitting
- Generating data from the fit
- Seeing if the generated data also fits well. (It must, to within sampling error.)
Maybe there is something I'm not seeing.
  Image Analyst
      
      
 on 3 Oct 2023
        I agree with @Star Strider and @the cyclist -- if you can't use the actual distribution and must use a formula, you should use one that has theoretical justification.  Like they said, there is a theoretical basis for using a log-normal distribution.  I've seen it for countless measurements.  It almost doesn't matter what I'm measuring (area, perimeter, circularity, or whatever) with particles, they all seem to have a log normal distribution.  If you want a reference for the theory, see the bible on particle size measurements by Terence Allen of Dupont: "Particle Size Measurement"
0 Comments
  Star Strider
      
      
 on 4 Oct 2023
        I would use the histfit function, then if the fit appears to be acceptable, use the fitdist funciton to estimate the parameters.  As I mentioned previously start with the lognormal distribution and search for similar distributions with positive suport if the lognormal distribution does not provide an accurate fit.  
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



