2 views (last 30 days)

Show older comments

Hey everyone, I am working on a normal distribution to better understand the random choice dataset I am working with. The current gaussian curve is very wrong and I am not sure why. Basically, in a group of 10,000 individuals there is a 1.3% chance they will be placed in population A, and a 98.7% chance an individual is placed in population B. I am mainly concerned with population A, as that is the population of interest. The random assignment is ran 100 times, and the values are placed in empty arrays RunSumA1 and RunSumB1.

Here is the current code and graph:

options = ['A', 'B'];

%Empty vectors for value storage

RunSumA1=[];

RunSumB1=[];

ListOneProbability=0.013;

ListTwoProbability=0.987;

totalRuns = 100;

%Storage loop from function

for k=1:totalRuns

[RunSumA1,RunSumB1]= my_sorter(RunSumA1,RunSumB1,options,ListOneProbability,ListTwoProbability);

end

%Converting to Column Vector

RunSumA1 = RunSumA1';

%Average, STD, Normal PDF

mu = mean(RunSumA1);

sigma = std(RunSumA1);

y = normpdf(RunSumA1, mu, sigma);

%Plot

plot(RunSumA1, y)

%Random choice function

function [RunSumA1,RunSumB1]= my_sorter(RunSumA1,RunSumB1,options,ListOneProbability,ListTwoProbability)

tempA1 = 0;

tempB1 = 0;

for j=1:10000

newChoice = randsample(options, 1,true, [ListOneProbability,ListTwoProbability]);

if newChoice == 'A'

tempA1=tempA1+1;

else

tempB1=tempB1+1;

end

end

RunSumA1=[RunSumA1,tempA1];

RunSumB1=[RunSumB1,tempB1];

end

the cyclist
on 1 Jul 2021

Can you be more specific what you mean by "the current gaussian curve is very wrong"? Do you mean with the lines connecting all over the place? If you just plot points rather than lines, then you see the nice smooth curve:

options = ['A', 'B'];

%Empty vectors for value storage

RunSumA1=[];

RunSumB1=[];

ListOneProbability=0.013;

ListTwoProbability=0.987;

totalRuns = 100;

%Storage loop from function

for k=1:totalRuns

[RunSumA1,RunSumB1]= my_sorter(RunSumA1,RunSumB1,options,ListOneProbability,ListTwoProbability);

end

%Converting to Column Vector

RunSumA1 = RunSumA1';

%Average, STD, Normal PDF

mu = mean(RunSumA1);

sigma = std(RunSumA1);

y = normpdf(RunSumA1, mu, sigma);

%Plot

plot(RunSumA1, y, '.') % Changed this line to plot only the points

%Random choice function

function [RunSumA1,RunSumB1]= my_sorter(RunSumA1,RunSumB1,options,ListOneProbability,ListTwoProbability)

tempA1 = 0;

tempB1 = 0;

for j=1:5000 % I made the smaller, since it would not time out on the Answers forum

newChoice = randsample(options, 1,true, [ListOneProbability,ListTwoProbability]);

if newChoice == 'A'

tempA1=tempA1+1;

else

tempB1=tempB1+1;

end

end

RunSumA1=[RunSumA1,tempA1];

RunSumB1=[RunSumB1,tempB1];

end

Is that it? Or is there some other problem?

Alan Stevens
on 1 Jul 2021

Edited: Alan Stevens
on 2 Jul 2021

If you have 10000 individuals with a probability of 0.013 of being in group A, you would expect your curve to peak close to 130 individuals on average. Perhaps you want something like the following:

Npop = 10000;

probA = 0.013;

Ntrials = 100; %

A = zeros(Ntrials, 1);

B = zeros(Ntrials, 1);

for trial = 1:Ntrials

r = rand(Npop,1);

ir = find(r<=probA);

A(trial) = numel(ir);

B(trial) = Npop - A(trial);

end

mu = mean(A);

sigma = std(A);

disp([mu, sigma])

lo = 80;

hi = 180;

x = linspace(lo,hi,100);

y = exp(-0.5*((x-mu)/sigma).^2)/(sigma*sqrt(2*pi));

plot(x,y)

xlabel('Numbers in group A')

ylabel('frequency')

grid

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!