Splitapply array to fit distributions

2 views (last 30 days)
I have an Mx1 vector and an Mx1 column of categorical variables. Is there a way to splitapply so that I can get a distribution for each data in a column according to their group? I want to do this as compactly as possible because in reality I have an MxN array and I will iterate column-wise such that I get a probability distribution for each cateogries values at each column. Like the code below:
%%Generate some data
X1 = 10 + 5 * randn(200, 1);
X2 = 20 + 8 * randn(250 ,1);
cat1=repmat("a",200,1);
cat2=repmat("b",250,1);
X = [X1; X2];
cats=[cat1:cat2]
%%Fit a distribution using a kernel smoother
myFit1 = fitdist(X1, 'kernel')
myFit2 = fitdist(X2, 'kernel')
I would like to make sure each of the fits are the same as when I do something like:
newfit=splitapply(@fitdist,X,G)
but I get the error that fitdist doesn't have enough input arguments. I'm new to anonymous functions, but I suspect I need to somehow pass 'kernel' to fitdist in splitapply. Can anyone help?

Accepted Answer

Cris LaPierre
Cris LaPierre on 9 Dec 2020
Edited: Cris LaPierre on 9 Dec 2020
You are close. You need to use findgroups to create your grouping variable G. Then it's just a matter of setting up your function handle correctly. Here's your code with a slight modification from me. If you compare the results, you'll see they are the same.
%%Generate some data
X1 = 10 + 5 * randn(200, 1);
X2 = 20 + 8 * randn(250 ,1);
cat1=repmat("a",200,1);
cat2=repmat("b",250,1);
X = [X1; X2];
cats=[cat1;cat2];
%%Fit a distribution using a kernel smoother
myFit1 = fitdist(X1, 'kernel')
myFit1 =
KernelDistribution Kernel = normal Bandwidth = 1.81224 Support = unbounded
myFit2 = fitdist(X2, 'kernel')
myFit2 =
KernelDistribution Kernel = normal Bandwidth = 2.91665 Support = unbounded
% Now use findgroups/splitapply
G=findgroups(cats);
newfit=splitapply(@(X)fitdist(X,'kernel'),X,G);
newfit(1)
ans =
KernelDistribution Kernel = normal Bandwidth = 1.81224 Support = unbounded
newfit(2)
ans =
KernelDistribution Kernel = normal Bandwidth = 2.91665 Support = unbounded

More Answers (0)

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!