6 views (last 30 days)

Show older comments

One of the most suggested (in fact the only one to my finding) for generating random numbers (<1) that will sum to 1 is Random Vectors with Fixed Sum by Roger Stafford. However, what I noticed is that the data generated is not well dispersed. e.g.,

P = randfixedsum(10,10000,1,0.05,0.9); % a 10-by-100000 matrix where each column of P sums to 1 and each elements is between 0.05 and 0.9

find(any(P>0.5))

ans =

1×0 empty double row vector

So far, every single time I tried it results in an empty vector - it always limits itself within below 0.5. Is there a way I could generate more dispersed data where it would include values between 0.05 and 0.9 (for the above example)?

Thanks in advance for your kind help.

FYI: I have tried this (took help from one of the MATLAB answers)

function P = rand_fixed_sum_2(p,n) % p number of columns, and n number of rows and each column sums to 1

for j = 1:p

n1=10^(n-1);

m=1:n1;

a=m(sort(randperm(n1,n)));

b=diff(a);

b(end+1)=n1-sum(b);

P(:,j) = (b/sum(b))';

end

end

But obviously the value of n1 is not feasible for higher dimensions (n>5). However, for lower dimensions, by tweaking n1, I could get much more dispersed data.

John D'Errico
on 28 Jun 2020

Edited: John D'Errico
on 28 Jun 2020

I think you do not understand what you are asking.

randfixedssum indeed produces results that are uniformly sistributed within the sub-set in question. That is, any point in a 10 dimensional space that satisfies the requirements of a fixed sum is equally likely to arise.

However, that does not mean that it is at all probable you would find something that satisfies your goal, of "dispersion".

For example, suppose you were to choose one element that is greater than 0.5? Then the probability that the other 9 elements were ALL small enough that the sum is 1, is pretty low. In the 9 dimensional space that remains, that event would be actually very uncommon.

Thus, you want to generate 10 numbers, all of which lie between 0.05 and 0.9, such that the sum is 1.

Suppose, just suppose that one of the numbers was say, 0.6? Now what are the odds that you can find 9 other numbers that make the total sum exactly 1, but none of them are less than 0.05? SURPRISE! It can never be done.

In fact, if any simgle element was any larger than 0.55 in this example, your goal will never be doable. So if one element is as large as even 0.55+eps, it is mathematically impossible to find 9 numbers, all of which are between 0.05 and 0.9, such that the sum is 0.45-eps.

Next, suppose one element was even as large as 0.5? Just one element that large?

Now the other 9 elements must all be very close to 0.05. What is the probability of that event? Not surprisingly, it is pretty darn small. I can compute the actual probability of such an event to happen if you need. Being too lazy to think at this time of day...

X = randfixedsum(10,10000000,1,0,0.9);

sum(max(X) >= 0.5)

ans =

195844

So 1.96e5 such events in 1e7. A little under 2% of the time. As expected, a rare event, and that is EXACTLY as it should be.

You ask for dispersion. But you don't seem to understand what dispersion means or what it implies in this context.

If I look at the distribution of the maximum of all 10 elements, I get something that is actually pretty reasonable.

X = randfixedsum(10,10000,1,0.05,0.9);

Min 0.1207

1.0% 0.1342

5.0% 0.1445

10.0% 0.1524

25.0% 0.1674

50.0% 0.1884

75.0% 0.2167

90.0% 0.2503

95.0% 0.2738

99.0% 0.3143

Max 0.4039

Most of the time, we get a maximum value that is pretty small in context. And that is because the sample truly is uniformly distributed around the constraint space. One point in that space is equally as likely to arise as any other point. But that does NOT mean that the maximum is ever likely to be larger than 0.55. In fact, that would be an impossible event.

Suppose instead, that we change the way things were generated? Now, instead of requiring that the min be 0.05. Just make it 0. How do the statistics change?

X = randfixedsum(10,10000,1,0,0.9);

Min 0.1395

1.0% 0.1681

5.0% 0.1902

10.0% 0.205

25.0% 0.2353

50.0% 0.2784

75.0% 0.3359

90.0% 0.401

95.0% 0.4492

99.0% 0.5479

Max 0.8123

As you now see, the maximum element is now considerably larger. In the same size sample, I once got something as large as 0.8123. There is now much more room for those "dispersed" events to arise.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!