Use Kernel Density Estimation to get the probability of a new observation
6 views (last 30 days)
Show older comments
Tatiana Lopez
on 1 Apr 2015
Commented: Tatiana Lopez
on 1 Apr 2015
Hi all, I would like to use KDE to fit a 1d variable and then getting the probability of a new observation given the fitted model using pdf( kde, new observation ). I used fitdist with the 'Kernel' option and then plotted the corresponding pdf. My question is, shouln't the pdf be normalized? How can I get the probability of a new observation then?
Thank you
x = [15.276,15.277,15.279,15.28,15.281,15.186,15.187,15.188,15.19,15.191,15.193,15.194,15.195,15.197,15.198,15.199,15.201,15.202,15.204,15.205,15.206,15.208,15.209,15.211,15.212,15.213,15.215,15.216,15.217,15.219,15.22,15.222,15.223,15.224,15.226,15.227,15.229,15.23,15.231,15.233,15.234,15.236,15.237,15.238,15.24,15.241,15.243,15.244,15.245,15.247,15.248,15.249,15.251,15.252,15.254,15.255,15.283,15.284,15.286,15.287,15.288,15.29,15.291,15.172,15.173,15.174,15.176,15.177,15.179,15.18,15.181,15.183,15.184,15.293,15.294,15.167,15.169,15.17,15.295,15.297,15.298,15.299,15.301,15.302,15.304,15.305,15.306,15.308,15.309,15.311,15.312,15.313,15.315,15.316,15.145,15.147,15.148,15.149,15.151,15.152,15.154,15.155,15.156,15.158,15.159,15.161,15.162,15.163,15.165,15.166,15.293,15.294,15.167,15.317,15.319,15.32,15.322,15.323,15.324,15.326,15.327,15.329,15.33,15.331,15.333,15.334,15.336,15.337,15.338,15.34,15.341,15.343,15.344,15.345,15.347,15.348,15.349,15.351,15.352,15.354,15.355,15.356,15.358,15.359,15.361,15.362,15.363,15.365,15.366,15.368]';
t = max( min(x) - 0.1, 0 ) : 0.01 : min( max(x) + 0.1, 24);
kde = fitdist(x,'Kernel','Kernel','normal','Support','positive')
pdf_kde = pdf( kde, t );
plot(t, pdf_kde, 'g-'), hold on;
0 Comments
Accepted Answer
Brendan Hamm
on 1 Apr 2015
The pdf integrates to be 1, so I am not sure why you think it needs to be normalized? Furthermore, this gives you a continuous distribution and therefore P(x|distribution) = 0 for all x, as points have no mass. the pdf is NOT telling you a probability, but rather just the probability density function evaluation at this point. Probability is the area under the curve between two values (the limits of the integral). You could ask, "what is the probability a new observation is less than or equal to 15.2?" and the answer would be found from the cdf:
cdf(kde,15.2) % P(x <= 15.2)
ans =
0.2727
or, "What is the probability the sample will be less than 15.4 but greater than 15.2?":
cdf(kde,15.4) - cdf(kde,15.2)% P(15.2 < x < 15.4) = P(x < 15.4) - P(x < 15.2);
ans =
0.7105
I hope this helps clear this up.
More Answers (0)
See Also
Categories
Find more on Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!