Plot Mean over Box Charts using Positional and Color Grouping Variables

I'd like to plot mean over box chart which using positional and Color Grouping Variables.
Example)
tbl = readtable('TemperatureData.csv');
monthOrder = {'January','February','March','April','May','June','July', ...
'August','September','October','November''December'};
tbl.Month = categorical(tbl.Month,monthOrder);
boxchart(tbl.Month,tbl.TemperatureF,'GroupByColor',tbl.Year)
ylabel('Temperature (F)')
legend
hold on
meanTemperatureF = groupsummary(tbl.TemperatureF,{tbl.Month, tbl.Year}, 'mean');
plot(meanmeanTemperatureF,'o')
but, the positions of plot are not exactly.
How can I fix it?

3 Comments

We can't see anything w/o data -- but other than
plot(meanmeanTemperatureF,'o')
isn't the same variable as calculated by
meanTemperatureF = groupsummary(tbl.TemperatureF,{tbl.Month, tbl.Year}, 'mean');
there's nothing to see that looks suspicious in the above code.
Would have to see the actual results to have any reason to suspect the computed means aren't the actual values given the input data.
I'm sorry to miswrite.
Here's the right code.
plot(meanTemperatureF,'o')
And, the below picture is output of this code.
Well, you've got twelve months but for some reason only 11 categories on the boxplot axis and there are only 10+7 actual boxes by the time you've done the grouping by year.
Then you just plotted the means for an indeterminate number of months versus their ordinal positions.
You need to find/create the correct categorical variable value for each of those means that is associated with the position of the appropriate box -- and then since you have used grouping, that may still not quite line up.
Never done such an exercise before and suspect highly unlikely anybody on Answers has, either; attach your data as a .mat file so somebody can reproduce the result you have and easily explore how to make the necessary adjustments to axis values.
"Help us help you!"

Sign in to comment.

 Accepted Answer

Boxchart makes this really difficult! The location of the categories is well defined, but the offset (while easy to calculate) isn't included anywhere. Fortunately, you can plot the numeric equivalent of a categorical, and it's easy to convert.
Some bits of your code didn't quite line up for me (e.g. how you're calling group summary) so I used a dataset I happened to have around with very similar data. I plotted the means on each box, not sure if you were thinking the mean for each month - which would be much easier!
Note that I used the more robust 'ruler2num' to convert month names to their numeric values, but in reality the locations are just the category number, so the month number.
tbl = readtable('natick weather 2003-2014.csv');
tbl.Year=tbl.DATE.Year;
tbl=tbl(ismember(tbl.Year,[2004,2008,2012]),:);
%monthOrder = {'January','February','March','April','May','June','July', ...
% 'August','September','October','November', 'December'};
% alternate move:
monthOrder = month(datetime(2010,1:12,1),'name');
tbl.Month = categorical(month(tbl.DATE,'name') ,monthOrder);
meantemp = groupsummary(tbl,{'Month' 'Year'},'mean','TMAX');
%%
bc=boxchart(tbl.Month,tbl.TMAX,'GroupByColor',tbl.DATE.Year);
ylabel('Temperature (F)')
legend
hold on
xax=get(gca,'XAxis');
offset=(1:numel(bc))/numel(bc);
offset=offset-mean(offset);
for i = 1:numel(bc)
ind = string(meantemp.Year)==string(bc(i).DisplayName);
x=ruler2num(meantemp.Month(ind),xax)+offset(i);
y=meantemp.mean_TMAX(ind);
plot(x,y,'x','LineWidth',2,'DisplayName',"mean(" + bc(i).DisplayName + ")",'SeriesIndex',i)
end

6 Comments

@Dave B -- "Boxchart makes this really difficult! The location of the categories is well defined, but the offset (while easy to calculate) isn't included anywhere."
That is SO frustrating that TMW has done this kind of thing on these specialized plots! Can you use your influence inside to make enhancement requests to expose the internals of this nature on boxplot and the others in the general categories of these (relatively) recently-introduced specialty plots?
Q? like this illustrate that there are other things that can be done with these plots than just the builtin features and neutering them is a real disservice to their users wasting their time in trying to figure out such things.
When still in the active consulting role, I found MATLAB invaluable for its integration of all the tools and features in one place that took years to accumulate/semi-integrate with Fortran even though much of the internals numerically were built, at least originally, on the same numerical libraries.
The kicker still was that I could still find that it took an inordinate amount of time futzing around with the plotting routines to be able to take the raw outputs of plot or one of the special graphs that were a decent starting point and make it serviceable to be used to provide the client in the end. That has improved some, but one finds that one can still spend hours or even days, trying to clean up issues.
Yes absolutely, I can do my best with what little influence I have!
In truth, there are at least three reasons I'm here on MATLAB Answers.
  • it's fun to help people, and I'm pretty sharp with MATLAB as I used it for many years as a scientist before joining MathWorks (though I don't even pretend to compete with some of the MVPs!).
  • I like to get the word out about new features - I worked pretty hard on tiledlayout/nexttile, so when I see people doing stuff with subplot I want to say "hey, try tiledlayout, it's awesome!"
  • I want to learn about where people are struggling/frustrated with MATLAB (esp. graphics as that's where I work)...or maybe to accumulate evidence about such things to share with the folks who don't make it on here.
It can be difficult to talk about feature work on ML Answers - I can't make promises about what's coming in future releases, and I have to walk a fine line as a developer (i.e. not tech support or marketing). I can certainly say that I'm always a voice internally for the power users, and I think that one of the reasons MATLAB graphics has always been awesome is that you can (traditionally) tweak every little detail. Watching my partner try to tweak a chart in excel makes me wince!
On the other hand, we've got a complicated ecosystem and trying to keep things compatible across releases is a huge constraint. If we just exposed more of boxchart's implementation, lots of ecosystem behavior might break (off the top of my head: datatips, saving and loading fig files, the property inspector). That doesn't mean it can't be done, just that it's not as easy as flipping a switch. On the other hand, I'm compelled by the need to get the locations of those boxes, and I'll certainly request that enhancement! IIRC we added XEndPoints and YEndPoints Properties to Bar in 2019b and this seems pretty similar.
I think the new boxchart is way better than the old (stats toolbox) boxplot. I remember trying to decode the matrix of handles that boxplot spit out to do simple things like change colors or linewidths. Honestly, there's something missing here which I don't think just applies to boxchart, which has to do with how an axes (or really on axis) can represent hierarchical categorical data like this. I can't really speak to when we'll get this, other than to say I've gotten a lot of support to work on it but it's a big project and there have been too many other (higher priority) big projects in the way.
That's heartening to hear there are voices in the wilderness internally!
I started to mention my 20-yr+ crusade about bar, particularly about not revealing the midpoints of grouped bars. You can find me railing about it from the beginning of the Answers forum and if care to look on the newsgroup archives, it will predate Answers by many years! <VBG>
At least originally, there were always kludges until at one point, not only did TMW not yet expose the needed X coordinates but the also made the actual X locations a hidden property!!! that broke previous workarounds. Such decisions as that are truly mind-boggling and how they ever escape into the wild through internal TMW review just floors me when they happen on occasion.
bar is still a very weak entry into the field, bar3 even more so with its inability to do anything at all on the y axis. While I agree with you on the user interface inside Excel, truthfully the appearance and ability to make common plots much more presentable inside Excel now outstrips the dated appearance of handle graphics. In particular, the default color patterns and familys are garish and it is hard to build pleasing replacements -- Excel does a much nicer job "out of the box" imo in some of those areas.
For engineering graphics, however, it is still impossible to match TecPlot and if you do any serious FEM work it's virtually imperative. https://www.tecplot.com/plot-gallery/
After 40(?) years, there still are no builtin hatching patterns in MATLAB -- we had them on Calcomp pen plotters on the CDC mainframe in the 1960s...
As you say, so much to do, so little time...
One last comment --
@Dave B wrote " I can certainly say that I'm always a voice internally for the power users, ..."
I think (and have commented before) that the practice of neutering or munging property names on the base axis object when creating these specialty plots actually causes more difficulties for the newer users that are intended for than for power/experienced users. The latter know enough about handle graphics to be able to find hidden properties or have the experience needed to understand how to mung on an existing plot to add desired features.
Instead, it's precisely the less experienced such as the OP here who try to make what looks like a simple enhancement and immediately find the limitations and are stumpled. It would seem self-evident while creating the initial boxplot which uses grouping variables it is similar-enough to a grouped bar plot that there would inevitably be the need to use those plotting locations.
@dpb -
I definitely get the pain, and I'm certainly frustrated when I can't give people a way to (e.g.) change the fontsize on their heatmap title. I shouldn't have said 'power users', I couldn't think of a better term. Most folks who use MATLAB graphics won't do anything with the objects (the handle part of handle graphics) without getting advice from places like ML Answers...so that's sort of what I was thinking about.
I agree this is an area where we really need to improve. We definitely don't limit our functionality because we're trying to dumb it down or limit what you can do, that's the opposite of what (most) MW developers want. But we do need to find solutions to resolve issues with our big (and growing) ecosystem (and company) and cross-release compatibility, so that we can open up access without it being buggy. These are hard problems but I'm optimistic that we'll get there!
Thanks for the feedback...part of my purpose in Answers (besides being entertainment/stimulation after giving up the consulting gig) is that it gives the opportunity to raise these kinds of user pain points.
I know I tend to carp on a lot of details and may take a thread off on a side journey but I always try to make sure the OPs Q? is answered best as can on the way. :)
But, I think having these related types of similar cases raised hopefully will continue to raise the consciousness of the development team -- I know it probably isn't true, but it seems to me as a longer-time user a trend towards releasing features that are not yet really ready and that there is far less consistency across the base product and toolboxes than before. It seems as though there isn't an overall corporate-wide oversight that really enforces syntax rules/documentation to try to maintain that cohesive nature but that the various toolboxes are almost totally separate products.
I understand the difficulties; the shift from purely procedural coding style of the original MATLAB to object-oriented/class-based methods is a major dichotomy and schism to breach. I don't have the answer (so to speak?), but believe there needs to be more effort into the area during the initial design of new functions/features/toolboxes to try to minimize these differences going forward.

Sign in to comment.

More Answers (0)

Products

Release

R2021a

Asked:

KK
on 11 Nov 2021

Commented:

dpb
on 19 Nov 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!