Generate random numbers given distribution/histogram

72 views (last 30 days)
MATLAB provides built-in functions to generate random numbers with an uniform or Gaussian (normal) distribution. My question is: if I have a discrete distribution or histogram, how can I can generate random numbers that have such a distribution (if the population (numbers I generate) is large enough)?
Please post here if anyone knows of a good method of doing this.
Thanks, David

Accepted Answer

Jonathan Epperl
Jonathan Epperl on 27 Oct 2012
Since nobody has any suggestions, here's one. If you have a discrete distribution, say it is a Nx2 matrix PD, first column the discrete values, second the probabilities of the corresponding value -- so sum(PD(:,2))==1.
Then map the probablities to the unit interval and use rand. What mean by that:
% Those are your values and the corr. probabilities:
PD =[
1.0000 0.1000
2.0000 0.3000
3.0000 0.4000
4.0000 0.2000];
% Then make it into a cumulative distribution
D = cumsum(PD(:,2));
% D = [0.1000 0.4000 0.8000 1.0000]'
Now for every r generated by rand, if it is between D(i) and D(i+1), then it corresponds to an outcome PD(1,i+1), with the obvious extension at i==0. Here's a way you could do that, even though I'm sure there are better ones:
R = rand(100,1); % Your trials
p = @(r) find(r<pd,1,'first'); % find the 1st index s.t. r<D(i);
% Now this are your results of the random trials
rR = arrayfun(p,R);
% Check whether the distribution looks right:
hist(rR,1:4)
% It does, roughly 10% are 1, 30% are 2 and so on
If you want more help you should post a minimal example of the form in which you have the discrete distribution.
  5 Comments
Aasheesh Dixit
Aasheesh Dixit on 8 Jun 2020
one change is required:
p = @(r) find(r<d,1,'first'); % find the 1st index s.t. r<D(i);
Himanshu Tanwar
Himanshu Tanwar on 25 May 2022
Overall code may be written (with some slight modifications) as:
dx = 0.001;
x = -100 : dx : 100; % limits depending on the random variable definition.
% For example, (dx : dx : 100) for Rayleigh (0 not recommended).
f = PDF(x);
F = cumsum(f) * dx; % CDF
F = F / F(end); % recommended when max(F) is close to 1. Otherwise increase x - points to achieve F(end) - max value - close to 1.
N = 10000;
U = rand(1, N);
pdf2rand = @(u) x(find(u <= F, 1, 'first'));
X = arrayfun(pdf2rand, U);
X % desired random variable points
function f = PDF(x)
% Probability Distribution Function
...
end

Sign in to comment.

More Answers (2)

Image Analyst
Image Analyst on 28 Oct 2012
I didn't notice your question or I would have answered, especially since it's asked so often. Try RANDRAW ( http://www.mathworks.com/matlabcentral/fileexchange/7309-randraw) for a list of common distributions. Or to refresh yourself on the theory of why using the CDF works, see Wikipedia: http://en.wikipedia.org/wiki/Inverse_transform_sampling

Theron FARRELL
Theron FARRELL on 30 Apr 2019
Edited: Theron FARRELL on 30 Apr 2019
Hi there,
I use this naive function to generate artificial outliers applied in machine learning. Hope that it will be a bit help in your case.
function [Out_Data, Out_PDF, CHist] = Complement_PDF(Hist, Data_Num, p)
% Generate a 1D vector of data with a PDF specified as the complementary PDF of input historgram. Note that the larger
% Data_Num is, the more Out_PDF will resemble to CHist
% Input
% Hist: PDF/Histogram of data
% Data_Num: Desired number of data to be generated
% p: Precision given by number of digits after 0
% Output
% Out_Data: Generated data as per the complementary PDF
% Out_PDF: The complementary PDF as per Out_Data
% CHist: The complementary PDF as per Hist
% Example
% Hist = [1, 6, 7, 100, 0, 0, 0, 2, 3, 5];
% Data_Number = 100000;
% p = 3
Hist = Hist/sum(Hist);
CHist = 1- Hist;
CHist = CHist/sum(CHist);
CDF_CHist = cumsum(CHist);
CDF_CHist = double(int32(CDF_CHist*10^p))/10^p;
Out_Data = zeros(1, Data_Num);
Out_PDF = zeros(1, length(CDF_CHist));
for i = 1:Data_Num
% Generate a uniformly distributed variable
x = double(int32(rand*10^p))/10^p;
% Inversely index CDF
Out_Data(i) = Inverse_CDF(x, CDF_CHist);
temp = floor(Out_Data(i) * length(CDF_CHist));
Out_PDF(temp) = Out_PDF(temp) + 1;
end
figure;
subplot 221, bar(Hist);
subplot 222, bar(CHist);
subplot 223, plot(CDF_CHist);
subplot 224, bar(Out_PDF);
end
function [y] = Inverse_CDF(x, CDF_CHist)
CDF_CHist_Ext = [0, CDF_CHist];
y = 1;
for ind = 1:length(CDF_CHist)
if (x >= CDF_CHist_Ext(ind)) && (x < CDF_CHist_Ext(ind+1))
y = ind/length(CDF_CHist);
break;
end
end
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!