Looking for something like a matrix version of randsample... [vectorization!]

If I have a vector w (length n) and I want to pick a single random number between 1 and n using w as the weights, I can do something like
idx = randsample(n,1,true,w)
and I'll get a number between 1 and n with probability w(idx)/sum(w). Great.
Similarly, I have a matrix W (size N x M, where each of N,M is in the thousands or so), and I want to draw M random numbers between 1 and N, with the columns of W acting as independent weight vectors. I could obviously do
idx = zeros(N,1);
for i = 1:M
idx(i) = randsample(N,1,true,W(:,i));
end
...but I'm going to be calling this literally billions of times, so I'm looking for some efficiency.
I know that an equivalent way to think of this is to take my W matrix, normalize the columns so that they sum to one, do a cumsum on the columns, select a vector of uniform random numbers using rand(1,M), and find the first row indices where they are greater than the cumsum values, but I don't know how to do that without using a loop and find():
W_normalized = bsxfun(@rdivide,W,sum(W,1));
W_cdf = cumsum(W,1);
x = rand(1,M);
C = bsxfun(@lt,x,W_cdf);
and then the first row of each column of C with a 1 in it is my random number, but I haven't had any luck doing that in an efficient, vectorized way (I've seen this thread, but I think their conclusion to use a for-loop doesn't really seem to hold for larger matrices).
Any suggestions?
Thanks, Dan

 Accepted Answer

Here's a pure matrix version of your bsxfun calls. It should be internally parallelized if your matrices are large.
W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;
You still need to find the first "true" value in C for each column to form an index.
Below I make a logical array where only the first "true" is present in each column:
idx = [C(1,:); xor(C(2:end,:),C(1:end-1,:))];
Hope this helps.

4 Comments

Thanks for the response, Kirby. I hadn't thought of diff as a solution to this -- I like it.
I'm interested in one step further -- how do I then extract the row indices with 1s in them from each column? With find, perhaps, in row/column form? I'm looking for a vector of integers (like a vector of outputs from randsample).
Thanks, Dan
You can take the x location from find() output. I'm actually switching out diff() for xor() below to avoid the double array output from diff().
W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;
idx = [C(1,:) ; xor(C(2:end,:),C(1:end-1))];
[idxLocation,~] = find(idx);
I just realized that I never "accepted" this answer. Thanks very much for the huge assistance! You presumably saved me a hundred years of CPU time.

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!