Equally distributed multidimensional random values with boundaries - how to generate?

I have to generate a matrix that will have 100 columns. Every row represents a value that can change in defined range. For example, I can describe it by the array:
A=[1 5; 3 7; 1 10]
Where 1 to 5 is first row range, 3 to 7 is the second, and 1 to 10 is the last. If I want to generate the random distribution to cover the range, just for one line, I can do this as follow:
data = lb + rand(1,100) .* ( ub - lb );
Where ub and lb are upper and lower boundary. Now, I can reproduce this in simple for loop:
for i=1:size(A,1)
lb=A(i,1);
ub=A(i,2);
data2(i,:) = lb + rand(1,100) .* ( ub - lb );
end
But in this case, every single row is evaluated separately, So I don't have any guarantee that the distribution will be equal in the meaning of comibinations between rows, as every rows changes independent. For example I can encounter situation where I will not have any combination with Row 1 close to 1 and Row 2 close to 7, just because of RNG. Is there any way I can sovle my problem and ensure multidimensional equal random distribution?

7 Comments

Maybe I do not understand what you want to do. You wrote:
"I don't have any guarantee that the distribution will be equal in the meaning of comibinations between rows, as every rows changes independent."
The solution you propoesed will generate independent columns of uniformly discributed random numbers with the ranges you want. Each column of A is a uniformly distributed random point in a 3D rectangular prism with the bounds as specified in A. Since data2 has 100 columns, there are 100 random points in the 3D rectangular prism. Is that not satisfactory?
It is satisfactory only partially. I will explain it other way. Please think about rows as some kind of properties. We have propertie 1, propertie 2 and propertie 3. All in the ranges described in matrix A. If the A would denotes all possible values, not ranges, the number of combination will be finite. Like I can have columns: [1 3 1], [1 3 10], [1 7 1], [1 7 10], [5 3 1], [5 3 10], [5 7 1], [5 7 10]. Now go complicate - I can have any values between mentioned ranges, so the numbers of solutions is infinite. Now I want to gererate a random set of 100 columns, that will equally cover the area of allowed solutions. How to do this?
A uniformly distributed independent random sample does not require that two successive values have any relation to each other. I'll use rand to make an example.
rand()
ans = 0.9470
So the first point I got lies above 0.5. Now, I'll sample a second point. There is NO reaon to expect that the next point will lie at some value less than 1/2, even if the first sample I generated was greater than 1/2.
rand()
ans = 0.5811
Do you understand that? Likewise, flipping a fair coin twice in a row does not mean that if the first toss was a head, then the second toss MUST be a tail, or even that a tail is any more likely on the second coin toss.
But that is exactly what you are asking to have happen. You seem to think a uniformly distributed set of numbers has some sort of memory, so that future samples will in some way depend on the previous samples. I'm sorry, but that is not how independent random variables work. And each successive random sample (from rand) is independent from the previous ones, as much as is possible in the context of how a pseudo-random variable can be. And rand was designed to use a very good pseudo-random variable scheme, with very good statistical properties.
Yes, the laws of probability and statistics do apply. Over a long term, the sample mean of a distribution will tend to the population mean. So that eventually things will balance out.
No, that is not what I'm asking for. You siad about tossing a coin twice. I'm speaking about tossing a coin, and than draw a card, that's a huge difference. And yes, I need a way to create more like equal 3-dimmensional distributtion, than just a random independed rows. I'm not sure If you've read my previouse comment, as I sent it 1 min before your. If not, please kindly go through it for more explanations.
Say A = [1 5; 3 7].
Now say you generate random columns as
for i = 1:size(A,1)
lb = A(i,1);
ub = A(i,2);
data(i,:) = lb + rand(1,100)* ( ub - lb );
end
Then the column vectors of the matrix [data(1,:);data(2,:)] are uniformly distributed on [1 5] x [3 7] although (or better: because) both are generated independently, namely row 1 as uniformly distributed over [1 5] and row 2 as uniformly distributed over [3 7].
Thus what you want, namely
Now I want to gererate a random set of 100 columns, that will equally cover the area of allowed solutions.
is fulfilled by using this approach.
I'm sorry, but I think you still misunderstand random numbers, what a uniform distribution means, and, apparently the entire point of my comment.
That you have columns with different ranges is completely irrelevant. Each column will be filled with sets of numbers that are uniformly distributed. And they are independent of other columns, or of previous samples.
For example:
n = 25000;
X = [rand(n,1),rand(n,1)*2 + 1];
So the first column of X (thus X(:,1)) is uniformly distributed, on the open interval (0,1).
X(:,2) is niformly distributed on the open interval (1,3).
These points, if taken as points in the two dimensional box (0,1)x(1,3), will fill that space uniformly. Of course if the sampling is coarse enough, the box will be filled in very well.
plot(X(:,1),X(:,2),'.')
If I choose n a bit larger, then the figure turns completely blue, with white showing through at all. And if you count the number of points in any local region of the box, so essentially a 2-dimensional histogram, then you would find that locally the number of points in that region will be proportional to the area of the region you looked at.
For example, histcounts produces that 2-d histogram.
[N,XEDGES,YEDGES] = histcounts2(X(:,1),X(:,2))
N = 10×10
225 253 249 264 235 204 250 255 267 265 250 287 247 221 251 244 277 244 228 256 279 251 236 249 224 258 262 267 263 296 257 243 227 230 266 228 273 232 223 250 267 274 248 252 236 241 268 236 261 255 257 263 264 242 258 255 248 270 220 242 239 272 243 249 253 227 222 248 244 265 240 262 277 253 270 252 259 239 239 262 255 264 257 235 289 258 200 235 231 264 238 245 232 252 251 234 233 254 254 261
XEDGES = 1×11
0 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 1.0000
YEDGES = 1×11
1.0000 1.2000 1.4000 1.6000 1.8000 2.0000 2.2000 2.4000 2.6000 2.8000 3.0000
And we would expect to see on average, with a 10x10 grid of bins on that domain, we would expect to see 1% of the samples falling in each bin. Indeed, that is what happens. If the sample size were larger, then the counts in each bin will more accurately approach that value of 1% in each bin. We expect to see some degree of variability of course in those bin counts, but as I have said, that will decrease with sample size.
surf(N)
That the different sets of variables live in different intervals is completely irrelevant. (Sorry, I forgot to scale the x and y axes in the 2-d hstogram plot.)
So do I understand correctly, that, as long as the sample size is high enough, the independed calculation of every row will not lead to any unequalities in distribution? I mean the case where, for example, I will have statistically important surplus of columns where the value of first row will be close to lb while in the second it will be close to up? It is pure RNG so I expected that without further limitations this case is at least possible.

Sign in to comment.

 Accepted Answer

@Karol, my new understanding is that you want to find a uniformly distributed random point in a 3D rectangle. The bounds of the rectangle are chosen at random from a discrete set of possibilities. A is 3x2. Column 1 of A has the 3 allowed lower bounds for the edges. Column 2 has the 3 allowed upper values. Am I understanding you correctly? If so you will need two discrete random choices (one each for lower and upper bounds) followed by a 3d uniform random choice.

8 Comments

Yes, you interpreted it correctly. Can you elaborate your answer. Maybe with a code snipped?
A=[1 5; 3 7; 1 10]; %possible bounds for x,y,z
N=100; %number of (x,y,z) triples
data2=zeros(3,N);
for j=1:N
lb=[A(randi(3),1);A(randi(3),1);A(randi(3),1)]; %lower bound for [x;y;z]
ub=[A(randi(3),2);A(randi(3),2);A(randi(3),2)]; %upper bound for [x;y;z]
data2(:,j)=lb+rand(3,1).*(ub-lb);
end
I think that does what you said you wanted. X, y, and z are all treated equally in the code above, so the distribution of x and y and z vaues should be similar.
plot3(data2(1,:),data2(2,:),data2(3,:),'*r'); %plot the points
Thank You, but Your code takes boundaries at random. For a test set A=[1 2; 3 4; 5 6]. You will see that your code can generate rows that have results even from 1 to 6. But your idea guided me toward another solution. First I generate uniformly distributed matrix of random numbers from 0 to 1, then I apply boundaries:
A=[1 5; 3 7; 1 10]
data2=rand(size(A,1),100);
for i=1:size(A,1)
lb=A(i,1);
ub=A(i,2);
data2(i,:)=lb + data2(i,:) .* ( ub - lb );
end
Which I think is what I needed.
And what's the difference to
for i=1:size(A,1)
lb=A(i,1);
ub=A(i,2);
data2(i,:) = lb + rand(1,100) .* ( ub - lb );
end
?
None.
In my case, huge. In the above mentioned loop you generate every row independently (rand is executed 3 times), so you have 0 guarantee that the distribution between rows will be equal. In the new solution I proposed in the comment, I generate results in one execution of rand, for all lanes, so the distribution must be equal. Applying boundaries later is purely mechanical, it will not change anything in distribution. In other word:
X=rand(3,7)
and
Y=[rand(1,7);rand(1,7);rand(1,7)]
Are, If I understand correctly, not equal in the meaning of equality in distribution between rows.
Concerning distribution types or whatever you aim at, X and Y are 100 % equivalent.
OK, thank you once again. I think we can cosider the question answered.

Sign in to comment.

More Answers (0)

Products

Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!