MATLAB Answers

fitgmdist for loop; nested for loop, combine array of different sizes

6 views (last 30 days)
Hi all,
Please help me. Please. This research student will thank you immensely.
Here's the scenario: I have two populations (main and sub). The sub population is 10% of the main population. The mean (mu) and sigma(std) of these two populations can change, but have certain predefined boundaries. Right now, I am only changing the sigma of the main population and keeping the sigma of the sub population, and the mean of both populations constant.
What I want: I want to use fitgmdist/GMModels/Gaussian distribution to be able to identify that there are indeed two components in that set of data. In other words, I am trying to see for what sigma_main values does the model ouput two components (numComponents = 2). Right now, I have a model that iterates through my sigmamain values (0.1:0.05:1) and goes through varying GMModels to find the one with the better AIC (least error) and outputs the numComponents for every sigma_main value, and then sigma_main vs numComponents is plotted. At least, that's what I think.
Here's my code:
%% define known constants
% rng(0,'twister')
n_main = 100000;
n_sub = 10000;
%sigma_main = 0.1;
sigma_sub = 0.1;
mu_main = 1;
mu_sub =3;
%% change sigma_main values and run model
sigmamain_val = 0.1:0.05:1;
outputData = zeros(length(sigmamain_val), 2); % create an output array for (length sigma_sub),numComponents
for i = 1:length(sigmamain_val)
sigmamain_pos = randi(length(sigmamain_val));
sigma_main = sigmamain_val(sigmamain_pos);
y = sigma_main.*randn(n_main,1) + mu_main; %10^6 SKBr3 cells
y2 = sigma_sub.*randn(n_sub,1) + mu_sub; %100,000 MDA MB 231
C = cat(1, y, y2);
AIC = zeros(1,4); % create an ouput array for AIC
GMModels = cell(1,4);
options = statset('MaxIter',00); % add optitons here to better/refine the model
for k = 1:4
GMModels{k} = fitgmdist(C,k);
AIC(k)= GMModels{k}.AIC;
end
[minAIC,numComponents] = min(AIC);
outputData(i,1) = sigma_main;
outputData(i,2) = numComponents;
%BestModel = GMModels{numComponents}
BestModel = GMModels{outputData(i,2)};
end
%% plot sigma main vs numComponents
figure(3)
plot(outputData(:,1),outputData(:,2))
xlabel('sigma main')
ylabel('numComponents')
Btw: the idea for this model with AIC is based off: https://www.mathworks.com/help/stats/fitgmdist.html
Here's the problem:
For starters: The code for y & y2, which are for my main and sub population respectively, only take into account the very last sigma value.
A bigger problem is that sometimes, my code picks a sigma_main value twice (i.e 0.8) and the number of components is sometimes different even though they are the same sigma_main value.
Also, I would like to ouput a "BestModel" or 1x1 gmdistrinution for every sigma_main value instead of just the last one and I don't know how to incorporate that in my for loop.
What I think: I need to go through sigma_main values in order (i.e only write "sigma_main = i" inside the for loop) so that values don't get repeated, but then my y and y2 cannot b concatenated (C = cat(1, y,y2) because my sigma_main value will be 19x1 instead of just one number.
To outut a BestModel for every sigma_main, I need to create an empty/zeroes array before but I don't know ehat type to do that for.
Please, please help me. I have stayed up two nights doing this and I really don't know how to move forward. It might be really simple, but I might be overthinking it.

  0 Comments

Sign in to comment.

Accepted Answer

Jeff Miller
Jeff Miller on 29 May 2020
I don't really follow what you are trying to do, but maybe some of these comments will help a bit:
I. Use a cell array to output a BestModel for every sigma_main. So, before the for loop, you need
BestModel = cell(numel(sigma_main),1); % pre-allocate array to hold fitgmdist outputs
and then inside the for loop
BestModel{i} = GMModels{outputData(i,2)}; % save the best model for later
II. I don't understand this line:
sigmamain_pos = randi(length(sigmamain_val));
Why don't you just go through the different sigma values in numerical order checking each one once?
III. It's no surprise that "the number of components is sometimes different even though they are the same sigma_main value", because you are generating data randomly each time through. Some random data sets might be more consistent with 2 components, some with 3, even though they were generated from the same underlying conditions.
IV. "The code for y & y2, which are for my main and sub population respectively, only take into account the very last sigma value." This is because you overwrite the previous y & y2 each time you go through the for loop. I'm not sure why you want to have these random numbers for all iterations of the loop, but you could save them in another cell array if you wanted.

  6 Comments

Show 3 older comments
Jeff Miller
Jeff Miller on 30 May 2020
Sorry, my fault. How about:
sigmamain_pos = i;
sigma_main = sigmamain_val(sigmamain_pos);
Elizabeth Cardona
Elizabeth Cardona on 30 May 2020
Amazing! I can work with this! I will accept your answer. If I have a question in the future, should I ask a new question or simply comment here?
Jeff Miller
Jeff Miller on 30 May 2020
It's better to ask a new question. More people look at questions that have not yet been answered

Sign in to comment.

More Answers (0)