Looking for something like a matrix version of randsample... [vectorization!]

Question

0 votes

If I have a vector w (length n) and I want to pick a single random number between 1 and n using w as the weights, I can do something like

idx = randsample(n,1,true,w)

and I'll get a number between 1 and n with probability w(idx)/sum(w). Great.

Similarly, I have a matrix W (size N x M, where each of N,M is in the thousands or so), and I want to draw M random numbers between 1 and N, with the columns of W acting as independent weight vectors. I could obviously do

idx = zeros(N,1);
for i = 1:M
idx(i) = randsample(N,1,true,W(:,i));
end

...but I'm going to be calling this literally billions of times, so I'm looking for some efficiency.

I know that an equivalent way to think of this is to take my W matrix, normalize the columns so that they sum to one, do a cumsum on the columns, select a vector of uniform random numbers using rand(1,M), and find the first row indices where they are greater than the cumsum values, but I don't know how to do that without using a loop and find():

W_normalized = bsxfun(@rdivide,W,sum(W,1));
W_cdf = cumsum(W,1);
x = rand(1,M);
C = bsxfun(@lt,x,W_cdf);

and then the first row of each column of C with a 1 in it is my random number, but I haven't had any luck doing that in an efficient, vectorized way (I've seen this thread, but I think their conclusion to use a for-loop doesn't really seem to hold for larger matrices).

Any suggestions?

Thanks, Dan

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Kirby Fears on 25 Nov 2015

Edited: Kirby Fears on 30 Nov 2015

Open in MATLAB Online

1 vote

Here's a pure matrix version of your bsxfun calls. It should be internally parallelized if your matrices are large.

W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;

You still need to find the first "true" value in C for each column to form an index.

Below I make a logical array where only the first "true" is present in each column:

idx = [C(1,:); xor(C(2:end,:),C(1:end-1,:))];

Hope this helps.

4 Comments
Show 2 older comments Hide 2 older comments

Dan Gianotti on 8 Mar 2017

I just realized that I never "accepted" this answer. Thanks very much for the huge assistance! You presumably saved me a hundred years of CPU time.

Kirby Fears on 9 Mar 2017

No problem. I'm glad it worked out.

Sign in to comment.

Looking for something like a matrix version of randsample... [vectorization!]

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

4 Comments
Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Looking for something like a matrix version of randsample... [vectorization!]

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

4 Comments Show 2 older comments Hide 2 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments