File Exchange

## Violin Plots for plotting multiple distributions (distributionPlot.m​)

version 1.15.0.0 (30.2 KB) by Jonas

### Jonas (view profile)

Function for plotting multiple histograms side-by-side in 2D - better than boxplot.

Updated 11 Feb 2017

Editor's Note: This file was selected as MATLAB Central Pick of the Week

The zip-file contains the following files for visualizing distributions:
- distributionPlot.m: main function that allows creating violin plots
- myHistogram.m: generate histograms with 'ideal' bin width given the number of data points and the spread (Freedman-Diaconis rule). Note that for integer-valued data, each integer gets its own bin.

In addition, the zip file contains four helper functions: countEntries, colorCode2rgb, isEven, myErrorbar

If you want to overlay individual data points, you need to download the separate submission plotSpread (http://www.mathworks.com/matlabcentral/fileexchange/37105).

DistributionPlot allows visualizing multiple distributions side by side. It is useful for skewed unimodal data and indispensable for multimodal data. DistributionPlot is especially useful for showing the time evolution of a distribution.

Some of the examples from the help:

r = rand(1000,1);
rn = randn(1000,1)*0.38+0.5;
rn2 = [randn(500,1)*0.1+0.27;randn(500,1)*0.1+0.73];
rn2=min(rn2,1);rn2=max(rn2,0);
figure
ah(1)=subplot(2,4,1:2);
boxplot([r,rn,rn2])
ah(2)=subplot(2,4,3:4);
distributionPlot([r,rn,rn2],'histOpt',2); % histOpt=2 works better for uniform distributions than the default
set(ah,'ylim',[-1 2])
data = [randn(100,1);randn(50,1)+4;randn(25,1)+8];
subplot(2,4,5)
distributionPlot(data); % defaults
subplot(2,4,6)
distributionPlot(data,'colormap',copper,'showMM',5,'variableWidth',false) % show density via custom colormap only, show mean/std,
subplot(2,4,7:8)
distributionPlot({data(1:5:end),repmat(data,2,1)},'addSpread',true,'showMM',false,'histOpt',2) %auto-binwidth depends on # of datapoints; for small n, plotting the data is useful

### Cite As

Jonas (2020). Violin Plots for plotting multiple distributions (distributionPlot.m) (https://www.mathworks.com/matlabcentral/fileexchange/23661-violin-plots-for-plotting-multiple-distributions-distributionplot-m), MATLAB Central File Exchange. Retrieved .

Monique Shotande

### Monique Shotande (view profile)

Works well. The examples are very helpful. Plotting different distributions on the left and right of a single violin was unclear at first, but the example in the comments made it clear that the widthDiv parameter is necessary here. Perhaps a bit more description on this feature would be helpful. My only minor issue with this plotting tool is the limited aesthetics ability of the plots for adjusting edge and face color and transparency.

Harshan Ravi

### Harshan Ravi (view profile)

To add to the previous comment I am looking for split violin plots.

Harshan Ravi

### Harshan Ravi (view profile)

Nice submission. I am new to violin plots. I have a plot I would like to generate. I have results from pre and post contrast agent I would like to use a single violin plot to show them i.e left side of the single plot pre and right side of the plot post agent. Does your script allow for such plots?

Tarek Rashwan

Sjoerd Nauta

Sansit Das

### Sansit Das (view profile)

Thank you for the tool. I am new to this. Can anybody help me know, where to give my input for x-values and y-values.

Dima

Cheng Chen

### Cheng Chen (view profile)

Ahmet Hacialiefendioglu

### Ahmet Hacialiefendioglu (view profile)

Lefteris Kosmidis

### Lefteris Kosmidis (view profile)

Great submission! Thanks. One question though: is there any way to control the bin width so that different distributions have equal widths?

Andre Zeug

### Andre Zeug (view profile)

I just realised that the function 'histogram' was renamed for download (but not in the function tab above). So everything is fine.

Andre Zeug

### Andre Zeug (view profile)

Thanks for sharing!
Did you thought about renaming your function 'histogram(varargin)'? It might shadow MATLAB's function 'histogram' Introduced in R2014b, which requires different input. This might cause confusion.

T A

### T A (view profile)

Whoops, I didn't mean to post that last comment...please ignore.

T A

### T A (view profile)

There appears to be a sizable bug when using strings (matrices or cell arrays) as categories. The data ignore the order of the categories, leading to arbitrary data distributions. Here's an example. Results are bad when the categories are 'a' and 'b', results are fine if the categories are 1 and 2.
a=randn(1000,1)+(1:1000).';
b=[repmat('a',500,1);repmat('b',500,1)];
% b=[repmat(1,500,1);repmat(2,500,1)];
figure
figure

jon erickson

### jon erickson (view profile)

Thanks for posting! This is a great tool.
One quick fix suggsted: when plotting using xValues option, should to modify line 905 using unique() as follows:
set(ah,'XTick',unique(sortedX));

else function will throw an error when there are repeat x values:
Error using matlab.graphics.axis.Axes/set
Value must be a vector of type single or double whose values increase

WJ

### WJ (view profile)

Hi there, as I'm new to matlab, can someone advise on how and where should I input my data? Thanks.
wj

Gerard Llorach

### Gerard Llorach (view profile)

Thanks for the code!
I found an error when wanting to use legends with distributionPlot.m. The first output handles (patch) return an integer instead of a Patch (CS). To fix it is quite simple:
line 44: hh = {}; % Instead of NaN(nData,1);
line 729: hh{iData} =...
line 731: hh{iData} =...

I don't know if there is place to suggest changes in Matlab File Exchange. I hope the author or somebody else can do this fix.

Eduardo

Alan Chauvin

### Alan Chauvin (view profile)

Thanks for the submission.
How can i add a legend using widthDiv to compare two series of distributions ?
using : legend('1','2') give me two blue box.
thx again

Ruggero G. Bettinardi

### Ruggero G. Bettinardi (view profile)

Great, Great, Great! Congrats Jonas!

One little suggestion: it would be amazing being able to constrain the density estimation within a given interval, so not to obtain "undesired tails" that trespass the desired lower and upper bound values - for example if you are plotting the violin plot from a set of scores that can only range from, say, 1 to 100, in order to prevent the tails of the violin spanning from values smaller than 1 and larger than 100.

This is a great tool thank you. Is there an option to make the distribution plot higher in resolution? It looks pixelated.

I get around this by editing the ksdensity function call at 603. ksdensity takes a 'pts' argument where you can specify an arbitrary number of points to get finer resolution violin plots.

Andrea Rovinelli

### Andrea Rovinelli (view profile)

Great peace of code, just I was looking for. However, I have a question: Is there any way to normalize histograms across comparison (i.e. when using the option "widthDiv") such that both the left and right distributions will have the same area?

Brian Katz

### Brian Katz (view profile)

Sorry, this was my mistake in a way. If the data vector is a row, not a column, the result of the grouping are identical datasets. Could be good to put a check in here, to verify that the dimensions of the data and the grouping variable are the same.

Brian Katz

### Brian Katz (view profile)

Can anyone confirm that this works with grouping the variable (and under which MatLab version)? I am having problems. Maybe an example would be good to confirm this. I get identical group data after grouping (R2017a).

Federico Tartarini

Bazo Kara

Jonas

### Jonas (view profile)

@Wynn, Markus: I have updated distributionPlot and renamed histogram.m

Dan

Wynn

### Wynn (view profile)

I'd like to echo Markus Millinger's comment that the code over-writes the MATLAB builtin 'histogram' function. Any chance of a patch with a renamed 'histogram.m'?

Anne Urai

Shilo

### Shilo (view profile)

Great, Thanks, very useful!
Is there an option to use the addSpread function and color the dots using different values- so adding another dimension to the data?

Isobel

### Isobel (view profile)

This is great, thanks. However, would you consider adding an option to cut plots off in the y-direction at the min and max of the dataset?

Markus Millinger

### Markus Millinger (view profile)

This is very nice! However, the function histogram clashes with the "new" Matlab function with the same name.

Amir

### Amir (view profile)

Neat and nice. Much better than the box-plot for scientific work

Martin Sundqvist

Tiago

Johann

Edgar Guevara

### Edgar Guevara (view profile)

Displaying distributional differences provide more information of the samples and are very useful when distance from zero is meaningless.
Furthermore, the option to overlay the mean, SEM, sd and percentiles helps us better interpret the statistical analyses.
Overall, an invaluable option to the classic barplots and boxplots.

Holger Hoffmann

### Holger Hoffmann (view profile)

Excellent, just what I needed. It served me very well.

I added a modified version to the MatLabFEx using the smooth kernel density (Violin Plot based on kernel density estimation).

Jonas

### Jonas (view profile)

@Warwick: this looks like a bug - globalNorm=2 should do the trick, but at the moment, it seems like it would require equally spaced bins. I'll look into it.

Warwick

### Warwick (view profile)

This is a great function. However I want to discriminate between two quite different distributions. I have a problem getting the Total area under the respective curves to be equal (to a nominal 1) for separate datasets (even with the same number of observations). Eg, Say I want to plot U and V left and right respectively where
U = normrnd(3.3,1.0,100,1);
V = normrnd(2.0,0.3,100,1);

then no matter what I do, they don't look anywhere near equal. Any ideas? or have I missed something obvious?

Sturla Kvamsdal

Dan K

### Dan K (view profile)

This is a great tool... It would be nice if some of the functionality could be achieved without requiring toolboxes (e.g. I've cobbled together the code to do the smoothed histograms without the spline toolbox, using files from FEX).

Jonas

### Jonas (view profile)

@all: thanks again for the suggestions, most of which are implemented now. Please note that plotSpread is now a submission on its own that needs to be downloaded separately.

Andres

### Andres (view profile)

Very, very useful!

Jonas

### Jonas (view profile)

@Yuri Kotliarov: I suggest you call addSpread.m directly, rather than via distributionPlot.m

@all: thanks for the good suggestions. I hope I can implement them soon!

Yuri K

### Yuri K (view profile)

@Jonas, I didn't find if there is a way to change the width of dots spread (addSpread is 1). It doesn't seem to depend on distWidth. If I don't show the density (color is white), the distance between groups is quite large. Thanks.

Kelly Kearney

### Kelly Kearney (view profile)

Overall, this is a great function, and I use it quite often to analyze model ensemble output. A few enhancements that could be nice:

- Add the option to display in a horizontal orientation.

- Add the option to filter outliers when calculating bin widths and kernal densities. Could also be nice to display these as points, as in boxplot, rather than connecting them via long lines to the main histogram.

- This is an edge case, but the function will error under the addSpread option if a column/group contains only NaNs and/or Infs.

Warwick

### Warwick (view profile)

This is very good. I've just included some plots in a report. Thank you. Possibly you could add an extra feature within the options of 'showMM' = 6, say, which would be to draw a horizontal line of linewidth 2 for the median, and 25 & 75 pctiles at linewidth 1.

Jonas

### Jonas (view profile)

@Yuri: I have implemented your suggestion (though I start the histograms from the very left or right side, respectively), and fixed the previous bug.

Yuri K

### Yuri K (view profile)

@Jonas: Thanks for the answer. May I suggest a new feature? It would be nice to draw histogram at certain direction. Currently it's only centered, but also can be left- or right- directed. All you need to change is xBase variable at line 401: 0.5 to 0 for left direction, -0.5 to 0 for right direction. For someone it's easier to understand when the distributions looks like turned histograms.

Jonas

### Jonas (view profile)

@Yuri Kotliarov: Currently, the only workaround is to call ksdensity outside of distributionPlot to ensure that the smoothing uses the same kernel:

x = zeros(10,1);
y = x+randn(10,1)*0.1;
[yy(:,2),yy(:,1)] = ksdensity(y,'width',0.01);
[xx(:,2),xx(:,1)] = ksdensity(x,'width',0.01);
distributionPlot({xx,yy},'showMM',false)

Unfortunately, the showMM option is bugged when you supply your own histograms at the moment, so you have to set that option to false.

Yuri K

### Yuri K (view profile)

@Jonas: I have problem with smoothing (histOpt=1) when all values for a group are the same. In this case the distribution plot is very wide comparing to the same data with a little variance.
For example:
x = zeros(10,1);
y = x+randn(10,1)*0.1;

The same happens with a few outliers in x. I understand it's probably how ksdensity function works. But can you do anything to make the above cases comparable?

Alexander

Jonas

### Jonas (view profile)

@Yuri: The new version of distributionPlot supports grouped data.

Yuri K

Great! Thanks.

Jonas

### Jonas (view profile)

@Yuri: No, it doesn't work with grouped data (yet). In the meantime, you can use a function like group2cell (http://www.mathworks.com/matlabcentral/fileexchange/11192-group2cell) to distribute your grouped data among cells to use with distributionPlot.

Jonas

### Jonas (view profile)

@Brian: Thanks for the suggestions, and for sending me your sample code. I have not had time yet to update my code, though, but I will look into it!

Yuri K

### Yuri K (view profile)

Does it work with grouped data, like boxplot does?

Brian Katz

### Brian Katz (view profile)

This works quite well, giving a very interesting data presentation method. Some improvements could be the use of a colormap, rather than a fored gray scale. An example in teh help would also be a good addition.
I have started to try and make a combined plot which allows for both boxplot (using boxplotCsub) and distributionPlot. As both are symetrical, they can both be collapsed to one-sided and then combing, giving two very interesting looks at the same data sets.

Brian Katz

Very very cool.

Andrei Bejan

Denzel Li

Rob Campbell

William Irwin

Chris Lydick

Oleg Komarov

Chiara

Christopher

### Christopher (view profile)

 11 Feb 2017 1.15.0.0 renamed histogram to myHistogram to avoid clash with new Matlab function of the same name, added support for boxplot overlays 14 Jun 2012 1.14.0.0 Improved documentation (more examples, link to plotSpread), added quantiles (thanks to Warwick for suggestion & testing). Also, belated thanks to Kelly for suggesting horizontal orientation. 11 Jun 2012 1.13.0.0 Added the following new features: - Horizontal plotting - Plotting of half distributions - Bugfixes Additionally, plotSpread is now a separate submission. 14 Dec 2011 1.12.0.0 Added option to align the bars at the left or the right (option "histOri"), as suggested by Yuri. Also, bugfix. 2 Oct 2011 1.9.0.0 Improved normalization options. Thanks to Jake for the suggestion. 21 Jun 2011 1.7.0.0 Fixed a bug in the code, and two mistakes in the example. 20 Jun 2011 1.6.0.0 Made colorbar more meaningful if there is only one colormap and the bins are normalized globally (i.e. globalNorm is set to 1). Thanks to Brian Katz for the suggestion. 20 Jun 2011 1.4.0.0 Changed input from optional arguments to parameterName/parameterValue pairs (note that the old syntax still works!). Added several new features, such as support for grouped variables, overlay of data points, and user-defined colormaps. 20 Jan 2011 1.3.0.0 Updated title to Violin Plot, because that's how (part) of these plots are called elsewhere. 25 Apr 2009 1.2.0.0 Documented previously undocumented functionality, chose better screenshot to demonstrate how distributionPlot is better for comparing distributions than boxplot 16 Apr 2009 1.1.0.0 Fixed cryptic error if the data was all NaNs (thanks Christopher for pointing it out!). distributionPlot now also automatically converts arrays in cells to vectors and throws a warning.
##### MATLAB Release Compatibility
Created with R2008a
Compatible with any release
##### Platform Compatibility
Windows macOS Linux