Help with probability density functions

Question

Jules Ray on 25 Feb 2019

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/446809-help-with-probability-density-functions

Edited: Torsten on 26 Dec 2022

Hello I have created several pdf's using the formula below (60 pdf's). I would like to calculate the mean of all these pdfs but I have not idea how to do this.

Here is the formula I used to create each of the pdfs, L1 is a structure that contain a matrix that contain the Z values. I used in this example the structure L1(1), I have 60 more of these structures so they go from L1(1).Z to L1(60).Z, I calculated pdf's for each of these pupulations of Z.

%pdf for the structure L1(1).Z
pd = fitdist(L1(1).Z(:),'Normal');
x_pdf = [min(L1(1).Z(:)):0.01:max(L1(1).Z(:))];
y = pdf(pd,x_pdf);

Thanks in advance for any help

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 25 Feb 2019

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/446809-help-with-probability-density-functions#answer_362588

Edited: John D'Errico on 25 Feb 2019

Open in MATLAB Online

This is not a question about MATLAB. But you have done something.

You do not have a PDF. You have an approximation to a PDF, sampled over a finite range, at a finite set of steps.

If you wish to compute the mean of a random variable with known distribution parameters, you would be best advised to use resources like wikipedia. Here, for example:

https://en.wikipedia.org/wiki/Log-normal_distribution

Note that it is stated on that page (look on the right side) the mean and variance of a Lognormal distribution, given the usual distribution parameters.

As well, since you are using fitdist, you already have the stats toolbox. So you have access to tools like lognstat (or the corresponding tool for whatever distribution you are using). Use the available tools. Do NOT try to cobble up code to do what you do not really understand. Writing code to do what already exists for you to use is just a bad idea when you have no clue as to what you are doing. (What evidence do I have that you have no clue about these things? It is that you don't know how to compute the mean of a continuous random variable. At worst, something immediately found online.)

Can you compute the mean of a distribution where the PDF is approximated at a finite set of points? Well, yes. You might want to read about the mean of a continuous distribution.

https://en.wikipedia.org/wiki/Mean

In there, you will find that the mean of a random variable is given as

distibutionmean = int(x*pdf(x),-inf,inf)

So you want to compute the integral of x times the pdf(x), integrating from -inf to inf. In the case of a distribution like the lognormal, the pdf only lives on [0,inf) so that would be the bounds of interest.

Now, if I compute the actual mean and variance of a standard lognormal PDF, thus with distribution parameters of [0,1], I will find that the mean is exp(1/2).

exp(0 + 1^2/2)
ans =
          1.64872127070013
          
[m,v] = lognstat(0,1)
m =
          1.64872127070013
v =
          4.67077427047161

As you see, lognstat agrees with my estimate of the mean.

Now, lets try it for a lognormal, approximated as you did.

x = 0:.01:10;
trapz(x,x.*lognpdf(x))
ans =
           1.4898533607038

As you can see, trapz did not do very well here, off by roughly 10%. The problem was not that I did not sample the PDF finely enough either, or the integration error of trapz.

The problem is that this does not sample the lognormal PDF sufficiently far into the tails. The lognormal distribution has a heavy right tail.

logncdf(10)
ans =
           0.9893489006583

Even trapz agrees with that measure.

trapz(x,lognpdf(x))
ans =
         0.989348905931384

So, CAN you compute the means of those approximate PDFs? Well, yes, you can use trapz to do so, as I showed. Should you? Sigh.

4 Comments
Show 2 older commentsHide 2 older comments

John D'Errico on 25 Feb 2019

Open in MATLAB Online

I'm sorry, but this still makes little sense. What is the mean of two PDFs together? What do you intend by that statement?

Taking points from many PDFs, then concatenating them together, and then using a normal fitdist on the result? Again, sorry, but that makes little mathematical or statistical sense.

You need to understand that the numbers generated by EVALUATING a PDF are not in themselves random variables. For example, if I did this:

x = linspace(-3,3,100);
p = normpdf(x);

you cannot simply add the vector p to another such construct, and have it mean something statistically, as I think you are trying to do.

This is a mistake I've seen others make. They confuse random variables, for example, the output from randn or rand, with the output from a PDF, such as normpdf.

As such, if I compute something like

mean(p)
ans =
       0.1646

that is NOT the mean of the distribution. Nor does it make sense to form the sum of two such vectors. Finally, it makes absolutely no sense at all to then try to throw p into a tool like fitdist.

parms = fitdist(p.','normal')
parms = 
  NormalDistribution
  Normal distribution
       mu = 0.164598   [0.136784, 0.192411]
    sigma = 0.140175   [0.123074, 0.162837]

In fact, the true mean and variance of the normal distribution inside randn has a mean of zero, and a variance of 1.

mean(randn(1,100000))
ans =
   -0.0015908
var(randn(1,100000))
ans =
       1.0028

Remember that these are only sample statistics, so they will approach the true mean and variance only as the sample size gets large.

Anyway, I think you are confused as to what a PDF means. I think you need to do a serious amount of reading about these things, as it looks like you are just trying to do virtually random things, and thinking they work. I would suggest a good starting course in probability and statistics, or at least a good basic text. Any such text would probably be fine.

shitaye on 26 Dec 2022

This is good. But how can the mean of a function can be proved. For example how can I proof that mean of hypergeometric distribution function is nK/N in mathlab?

Torsten on 26 Dec 2022

Edited: Torsten on 26 Dec 2022

Mathematica can, MATLAB has problems.

https://www.wolframalpha.com/input?i=sum%5Bk*%28K+choose+k%29*%28%28N-K%29+choose+%28n-k%29%29%2F%28N+choose+n%29%2Ck%2C0%2Cn%5D

Sign in to comment.

Answer 2

Jules Ray on 25 Feb 2019

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/446809-help-with-probability-density-functions#answer_362627

John D'Errico

I think we are not understanding each other, and instead of suggesting going to the university again I would suggest reading the question more carefully

I have this data L1, which is empirical, I obtained a PDF for this data, as every member of L1 comprises several measurements grouped in 60 groups. So I wanted to obtain the mean between these groups. Make this more sense?

Best

3 Comments
Show 1 older commentHide 1 older comment

Jules Ray on 26 Feb 2019

Hello Jeff

About your points:

It is hard to picture what data you are starting with, so could you explain a bit more about what your data look like? For example, post L1(1)?

I created a structure L1 that contine information from 60 sites, on each site, e.g. L1(n) I have thousands of measurements L1(n).Z, which is elevation data, these are in all cases normally distributed. So I obtained a pdf from this data, just like histfit would do it...

You said "I know how to get the mu and sigma of single PDF's". In that case, you could compute those 60 mu's and then average those, but evidently that is not what you want. Could you explain what you mean by the "mean of all these pdfs"? For example, suppose there were just two pdfs:

One is a uniform pdf between 0 and 1, and the other is a normal pdf with mean 0 and sigma 1. What would you say is "the mean of these two pdfs"?

i know but in my case all are normal distribuitions

in the begining I was thinking on extracting something like a a grand mean of the probabilities, I want to know the probability of certain elevation in the whole data

Jeff Miller on 26 Feb 2019

> i know but in my case all are normal distribuitions

OK then, suppose one is a normal with mean 0 and sigma 1, and the other is a normal with mean 1 and sigma 1. What would you say is "the mean of these two pdfs"?

> I want to know the probability of certain elevation in the whole data

If this is what you want to know, I am not sure why you are messing around with normal distributions in the first place. Why not simply combine the data from all 60 sites into one large dataset and tabulate the frequency of each different elevation? Doesn't that give you exactly this probability?

Sign in to comment.

Help with probability density functions

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

4 Comments
Show 2 older commentsHide 2 older comments

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

Help with probability density functions

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

4 Comments Show 2 older commentsHide 2 older comments

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

3 Comments
Show 1 older commentHide 1 older comment