Help me understand imhist, I have to recreate it in Python

24 views (last 30 days)
I got an array of float point numbers between 0 and 1. imhist is called with 8 bins (it's and old legacy code I have not written myself). So I want to do the same in Python, but when I recreated the function, I found that I don't understand the matlab results. Example:
arr = [0, 0.1, 0.3, 0.6, 0.9];
[res, bins] = imhist(arr, 8);
[bins'; res']
ans = 2×8
0 0.1429 0.2857 0.4286 0.5714 0.7143 0.8571 1.0000 1.0000 1.0000 1.0000 0 1.0000 0 1.0000 0
Now writing the same in Python, I get some different results:
arr = np.array([0, 0.1, 0.3, 0.6, 0.9])
bins = np.linspace(0, 1, 8)
w, t = np.histogram(arr, bins)
print(t)
print(w)
The bins are exactly the same, but the result array is
[2 0 1 0 1 0 1]
I get the Python results, there are 2 values between 0 and 0.1429. Furthermore, the result list has 7 elements, as the last bin is the closing number- Matlab has 8 results and like shown, the differ in a way I don't understand.
How do I have to interpret the Matlab results - and more importantly - how do I get the same results in Python?

Accepted Answer

the cyclist
the cyclist on 7 Feb 2023
Edited: the cyclist on 7 Feb 2023
The difference is in the details of how the bins are defined. The formula for the upper and lower limits of the bins is in the Tips section of the documentation for imhist.
Mimicking the formula there, for 8 bins (with double precision input) the bin intervals are
p = (1:8)';
n = 8;
lower = (p-1.5)/(n-1);
upper = (p-0.5)/(n-1);
binIntervals = [lower upper]
binIntervals = 8×2
-0.0714 0.0714 0.0714 0.2143 0.2143 0.3571 0.3571 0.5000 0.5000 0.6429 0.6429 0.7857 0.7857 0.9286 0.9286 1.0714
Note that output of imhist is reporting the bin "locations", which we can infer are the bin centers. Your bins in python are presumably the edges, so they are offset relative to MATLAB.
It looks like the imhist function does not allow specification of the bin centers or edges (just the bin count), so you will need to adapt your python code. The following works for this simple case, but I doubt it is the best way in general.
arr = np.array([0, 0.1, 0.3, 0.6, 0.9])
bins = np.linspace(0, 1, 8)
bins = bins -bins[1]/2 # offset the bins
w, t = np.histogram(arr, bins)
print(t)
print(w)

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!