wrong values in histogram plotting
6 views (last 30 days)
Show older comments
Elinor Ginzburg
on 27 Dec 2023
Commented: Elinor Ginzburg
on 27 Dec 2023
Hello,
I'm trying to plot a histogram of an array. I have a csv file with a list of double values, and I want to see how many elements have a value that is less or equal to 10% of the maximal value, 20%, 30% and etc.. I tried using the following code, but I get wrong statistics, when I check how many elements have a lesser or equal value to 10% of the maximal element, I see that there are 11173940 such elements. I did so by using the following code:
maxElement = max(array);
elementCount = sum(array < maxElement * 0.1);
when I print the histogram it shows like there are less than 180 elements that constitute this condition. this is the code I used (I have a lot of csv files that I want to read and analyze in the same manner, that's why the filename loop):
clear; clc;
dataDir = 'hist_res_rel';
fileList = dir(strcat(dataDir, '/*.csv'));
plotDir = 'plot_dir_rel';
for i = 1:numel(fileList)
fileName = fileList(i).name;
epoch = fileName(length(fileName)-5:length(fileName)-4);
if contains(fileName,'a_rel')
plot_title = strcat('A Realtive Value Change Between Epochs: ', epoch, '-', num2str(str2double(epoch)+10));
end
if contains(fileName,'b_rel')
plot_title = strcat('B Realtive Value Change Between Epochs: ', epoch, '-', num2str(str2double(epoch)+10));
end
rel_val = readmatrix(strcat(dataDir, fileName));
rel_val = abs(rel_val);
Max = max(rel_val);
p = 0.1;
x = zeros(10, 1);
y = zeros(10, 1);
for index = 1:10
percentage = Max * p;
x(index) = percentage;
if index == 1
y(index) = sum(rel_val <= x(index));
else
y(index) = sum(rel_val <= x(index) & rel_val > x(index-1));
end
p = p + 0.1;
end
f = histogram(rel_val, x);
xticks(x);
title(plot_title);
xlabel('Percantage of Relative Change');
ylabel('Amount of Parameters');
xticklabels({'0', '10','20','30', '40', '50', '60', '70', '80', '90', '100'});
saveas(f, strcat(plotDir, '/plot_', fileName(1:length(fileName)-3), '.jpg'));
end
this is the histogram that I get:
and this is the csv file that I'm trying to analyze just to make sure everything works (sorry, it's so large I had to use an external site for the upload):
Thank you so much for your time and attention, I appreciate your help.
0 Comments
Accepted Answer
Ganesh
on 27 Dec 2023
I understand that your histogram is inconsistent with the data you have. The issue you are facing can be easily resolved by adding 0 at the start of the variable "x".
When using a histogram, the histogram calculates the number of data points between edges. As your variable "x" begins with Max*0.1, the histogram plots interval between Max*0.1 and Max*0.2 and so on. By adding 0 at the start you can make the first edge to be 0, Max*0.1, which will give you the right result.
x = [0;x] % Add this line before plotting the histogram
Kindly refer to the following document for more information and examples on using the "histogram()" function:
Hope this helps!
More Answers (0)
See Also
Categories
Find more on Histograms in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!