How to automatically adjust the y-axis of a boxchart so that outliers are not considered?

103 views (last 30 days)
I have a number of boxcharts and some do have very spreadout outliers. How can I automatically adjust the y-axis of those plots so that only the boxplots and the whiskery but not the outliers are considered in determining the y-axis?
I have something like the first image and wish a result like in the second image.
Thanks fo any suggestions!

Accepted Answer

Adam Danz
Adam Danz on 8 Jan 2023
Edited: Adam Danz on 9 Jan 2023
> How can I automatically adjust the y-axis of those plots so that only the boxplots and the whiskery but not the outliers are considered in determining the y-axis?
In boxchart, outliers are defined as values greater or less than 1.5*IQR from the box edges where IQR is the innerquartile range. The box edges are the 25th and 75th quartile of the data. So, the outlier bounds are the 25th quartile minus 1.5*IQR and 75th quartile plus 1.5*IQR. These are the bounds that will be used to define your y axis limit.
For each box in the boxchart, these limits are computed as
iqrng = iqr(ydata);
lower = quantile(ydata, 0.25)-1.5*iqrng;
upper = quantile(ydata, 0.75)+1.5*iqrng;
The y limit will be the minimum lower value between all boxes and the maximum upper value between all boxes. This can be a bit tricky to compute when you're working with grouped boxes.
Here's a demo that creates a boxchart, computes the min and max outlier bound, and sets the y axis limit to the bounds. Don't miss the last section below on "A note on data visualization".
Create boxchart
All you need in your data is the "h" variable which his the handle to your boxchart object.
% Load and prepare data
tbl = readtable('TemperatureData.csv');
monthOrder = {'January','February','March','April','May','June','July', ...
'August','September','October','November','December'};
tbl.Month = categorical(tbl.Month,monthOrder);
% Add more outliers
rng(0)
r = unique(randi(565,1,20));
tbl.TemperatureF(r) = 2*tbl.TemperatureF(r);
w = unique(randi(565,1,20));
tbl.TemperatureF(w) = -1*tbl.TemperatureF(w);
% Create boxchart
h = boxchart(tbl.Month,tbl.TemperatureF,'GroupByColor',tbl.Year);
ylabel('Temperature (F)')
Compute limits based on outlier bounds
Replace h with your boxchart object handle.
% Loop through each boxchart object
upperbound = [];
lowerbound = [];
for i = 1:numel(h)
% Compute outlier bounds: box edges +/- (1.5 * IQR)
groups = findgroups(h(i).XData);
qtile.lower = splitapply(@(x)quantile(x,0.25),h(i).YData,groups);
qtile.upper = splitapply(@(x)quantile(x,0.75),h(i).YData,groups);
iqr = qtile.upper - qtile.lower;
upperbound = [upperbound; qtile.upper + 1.5*iqr]; %#ok<*AGROW>
lowerbound = [lowerbound; qtile.lower - 1.5*iqr];
end
ybound = [min(lowerbound), max(upperbound)];
% Set y axis limit
ylim(ybound)
A note on data visualization
The chart above is misleading because it hides many outliers that appear to not exist. There are two ways to imrpove this so that your data visualization more accuratly depicts your data.
  1. Turn off outliers using set(h, 'MarkerStyle','none'). Note, this is not the same as detecting and removing outliers from your data before plotting. Also note that you'll still need to implement my solution to update the axis limits.
  2. Clearly indicate that some outliers are outside of the chart within your text.
  4 Comments
Paul
Paul on 31 May 2023
How can I automatically adjust the y-axis of those plots so that only the boxplots and the whiskery but not the outliers are considered in determining the y-axis?
If you have the Statistics and Machine Learning Toolbox, boxplot offers a way to get the whisker values directly (instead of having to compute them) and controllin the plot limits, and also for indicating extreme values outside the limits, as @Adam Danz suggests.
However, for some reason boxplot doesn't offer a way to modify certain aspects of the plot after it's created (at least I couldn't figure that out), so one has to create a boxplot, interrogate it, and then create a new plot.
tbl = readtable('TemperatureData.csv');
monthOrder = {'January','February','March','April','May','June','July', ...
'August','September','October','November','December'};
tbl.Month = categorical(tbl.Month,monthOrder);
% Add more outliers
rng(0)
r = unique(randi(565,1,20));
tbl.TemperatureF(r) = 2*tbl.TemperatureF(r);
w = unique(randi(565,1,20));
tbl.TemperatureF(w) = -1*tbl.TemperatureF(w);
% Create box chart
figure
h1 = boxchart(tbl.Month,tbl.TemperatureF,'GroupByColor',tbl.Year);
% Create a box plot
figure
boxplot(tbl.TemperatureF,{tbl.Month tbl.Year},'ColorGroup',tbl.Year,'BoxStyle','outline','Symbol','o');
% get the whisker limits
wupper = findobj(gca,'Tag','Upper Whisker');
wlower = findobj(gca,'Tag','Lower Whisker');
wmax = max([wupper.YData]);
wmin = min([wlower.YData]);
% redo box plot with using those whisker limits, other Name-Value pairs
% might be preferable.
figure
boxplot(tbl.TemperatureF,{tbl.Month tbl.Year},'ColorGroup',tbl.Year,'BoxStyle','outline','DataLim',[wmin-10 wmax+10],'Symbol','o','ExtremeMode','compress');
Would need to do some more work to make those tick labels readable.
Also, I had to attach the csv file because I got a "file not found" error if I didn't, which is odd becuase that csv file ships with Matlab, and apparently Adam was able to read it directly.
Adam Danz
Adam Danz on 1 Jun 2023
Interesting idea to use DataLim and ExtremeMode @Paul. I suppose we could also set the ylim according to the min and max caps.
tbl = readtable('TemperatureData.csv');
monthOrder = {'January','February','March','April','May','June','July', ...
'August','September','October','November','December'};
tbl.Month = categorical(tbl.Month,monthOrder);
% Add more outliers
rng(0)
r = unique(randi(565,1,20));
tbl.TemperatureF(r) = 2*tbl.TemperatureF(r);
w = unique(randi(565,1,20));
tbl.TemperatureF(w) = -1*tbl.TemperatureF(w);
bh = boxplot(tbl.TemperatureF,{tbl.Month tbl.Year},'ColorGroup',tbl.Year);
% Compute min and max whisker
upperCaps = findobj(bh,'type','line','tag','Upper Adjacent Value');
lowerCaps = findobj(bh,'type','line','tag','Lower Adjacent Value');
% whisker range + some buffer (hard coded for this demo)
whiskerRange = [min([lowerCaps.YData])-3, max([upperCaps.YData])+3];
ylim(whiskerRange)
The reason you had to load TemperatureData.csv (thanks, by the way), is because in MATLAB R2023a demo files are not included with MATLAB and must be downloaded manually (see article). I wrote and ran my answer before 23a.

Sign in to comment.

More Answers (0)

Categories

Find more on 2-D and 3-D Plots in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!