# Calculating the probability of a data point in a histogram

102 views (last 30 days)
Curious Mind on 7 May 2018
Answered: Steven Lord on 8 May 2018
Hello:
The image below is a histogram of a large data set (90*1 double in blue) and a single data point (in red). I would like to compute the probability of the data (in red) against the blue data points. I could counts the counts on the left of the red bar and divide it by the total counts (90). But I want a matlab code that will do it more efficiently and in a faster way probably without even using the histogram. Thank you.

Steven Lord on 8 May 2018
Change the Normalization property of the histogram object then get the appropriate element of the Values property of that object.
rng default
x = randn(10000,1);
h = histogram(x)
h.Values(10)
Since the default Normalization method is 'count', this will tell you that there are 133 elements of x that fall into bin 10. [Since I used rng default, you should get the exact same random numbers in x as I did and so generate the exact same histogram.]
h.Normalization = 'probability';
h.Values(10)
Now h.Values(10) is 0.0133 which makes sense: 133 / 10000 (the total number of points) = 0.0133.
If you wanted to get the same information without actually bringing up the plot, the histcounts function also lets you specify a 'Normalization' method.
And I'd guess that histogram you showed was created with something more like 900 data points than 90. According to the Y limits each of the 5 central bars contain more than 90 elements, assuming you're using the default 'count' Normalization. Still not Big Data, but bigger.

Image Analyst on 7 May 2018
You need to know the edges of the bin, e1 and e2. Then you can simply do
percentageInBin = sum(data>=e1 & data < e2) / numel(data);
No histogram needed if you just need it for that one red bin.
By the way, it made me snicker when you described 90 elements as large. It literally would have to be around a million times that big before anyone might start considering it large.

Curious Mind on 8 May 2018
@ Image Analyst, does e1 and e2 belongs to the whole histogram or just the bar in red?
Curious Mind on 8 May 2018
Also if I have say a dataM (20*1) double matrix can I get the probabilities of all the rows in dataM at once against the data with 90 elements?
Image Analyst on 8 May 2018
Just the bar in red.
To do it without explicitly computing a histogram array, you'd have to do it one bin at a time. Much better to simply get the histogram and divide the counts array by the total counts. Why can't you compute the histogram?