Help with probability density functions

24 views (last 30 days)
Jules Ray
Jules Ray on 25 Feb 2019
Edited: Torsten on 26 Dec 2022
Hello I have created several pdf's using the formula below (60 pdf's). I would like to calculate the mean of all these pdfs but I have not idea how to do this.
Here is the formula I used to create each of the pdfs, L1 is a structure that contain a matrix that contain the Z values. I used in this example the structure L1(1), I have 60 more of these structures so they go from L1(1).Z to L1(60).Z, I calculated pdf's for each of these pupulations of Z.
%pdf for the structure L1(1).Z
pd = fitdist(L1(1).Z(:),'Normal');
x_pdf = [min(L1(1).Z(:)):0.01:max(L1(1).Z(:))];
y = pdf(pd,x_pdf);
Thanks in advance for any help

Answers (2)

John D'Errico
John D'Errico on 25 Feb 2019
Edited: John D'Errico on 25 Feb 2019
This is not a question about MATLAB. But you have done something.
You do not have a PDF. You have an approximation to a PDF, sampled over a finite range, at a finite set of steps.
If you wish to compute the mean of a random variable with known distribution parameters, you would be best advised to use resources like wikipedia. Here, for example:
Note that it is stated on that page (look on the right side) the mean and variance of a Lognormal distribution, given the usual distribution parameters.
As well, since you are using fitdist, you already have the stats toolbox. So you have access to tools like lognstat (or the corresponding tool for whatever distribution you are using). Use the available tools. Do NOT try to cobble up code to do what you do not really understand. Writing code to do what already exists for you to use is just a bad idea when you have no clue as to what you are doing. (What evidence do I have that you have no clue about these things? It is that you don't know how to compute the mean of a continuous random variable. At worst, something immediately found online.)
Can you compute the mean of a distribution where the PDF is approximated at a finite set of points? Well, yes. You might want to read about the mean of a continuous distribution.
In there, you will find that the mean of a random variable is given as
distibutionmean = int(x*pdf(x),-inf,inf)
So you want to compute the integral of x times the pdf(x), integrating from -inf to inf. In the case of a distribution like the lognormal, the pdf only lives on [0,inf) so that would be the bounds of interest.
Now, if I compute the actual mean and variance of a standard lognormal PDF, thus with distribution parameters of [0,1], I will find that the mean is exp(1/2).
exp(0 + 1^2/2)
ans =
1.64872127070013
[m,v] = lognstat(0,1)
m =
1.64872127070013
v =
4.67077427047161
As you see, lognstat agrees with my estimate of the mean.
Now, lets try it for a lognormal, approximated as you did.
x = 0:.01:10;
trapz(x,x.*lognpdf(x))
ans =
1.4898533607038
As you can see, trapz did not do very well here, off by roughly 10%. The problem was not that I did not sample the PDF finely enough either, or the integration error of trapz.
The problem is that this does not sample the lognormal PDF sufficiently far into the tails. The lognormal distribution has a heavy right tail.
logncdf(10)
ans =
0.9893489006583
Even trapz agrees with that measure.
trapz(x,lognpdf(x))
ans =
0.989348905931384
So, CAN you compute the means of those approximate PDFs? Well, yes, you can use trapz to do so, as I showed. Should you? Sigh.
  4 Comments
shitaye
shitaye on 26 Dec 2022
This is good. But how can the mean of a function can be proved. For example how can I proof that mean of hypergeometric distribution function is nK/N in mathlab?

Sign in to comment.


Jules Ray
Jules Ray on 25 Feb 2019
I think we are not understanding each other, and instead of suggesting going to the university again I would suggest reading the question more carefully
I have this data L1, which is empirical, I obtained a PDF for this data, as every member of L1 comprises several measurements grouped in 60 groups. So I wanted to obtain the mean between these groups. Make this more sense?
Best
  3 Comments
Jules Ray
Jules Ray on 26 Feb 2019
Hello Jeff
About your points:
  1. It is hard to picture what data you are starting with, so could you explain a bit more about what your data look like? For example, post L1(1)?
I created a structure L1 that contine information from 60 sites, on each site, e.g. L1(n) I have thousands of measurements L1(n).Z, which is elevation data, these are in all cases normally distributed. So I obtained a pdf from this data, just like histfit would do it...
  1. You said "I know how to get the mu and sigma of single PDF's". In that case, you could compute those 60 mu's and then average those, but evidently that is not what you want. Could you explain what you mean by the "mean of all these pdfs"? For example, suppose there were just two pdfs:
  1. One is a uniform pdf between 0 and 1, and the other is a normal pdf with mean 0 and sigma 1. What would you say is "the mean of these two pdfs"?
i know but in my case all are normal distribuitions
in the begining I was thinking on extracting something like a a grand mean of the probabilities, I want to know the probability of certain elevation in the whole data
Jeff Miller
Jeff Miller on 26 Feb 2019
> i know but in my case all are normal distribuitions
OK then, suppose one is a normal with mean 0 and sigma 1, and the other is a normal with mean 1 and sigma 1. What would you say is "the mean of these two pdfs"?
> I want to know the probability of certain elevation in the whole data
If this is what you want to know, I am not sure why you are messing around with normal distributions in the first place. Why not simply combine the data from all 60 sites into one large dataset and tabulate the frequency of each different elevation? Doesn't that give you exactly this probability?

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!