Parallel computing Monte Carlo

I'm running 1M simulations on a Monte Carlo basis. I'd like to improve the computation speed and I was thinking about parallel computing but wanted to have a guess about what to expect in term of improvment.
I'd like to understand a little more about modifications that have to be made in my code.
I actually have an almost fully vectorized version of MC simulations and therefore I don't run any loop for i=1:10^6 loop.
How should I modify my code for the parallel computing to be efficient ? I guess – since MATLAB’s strength is to work vectorised – I must not introduce a “ parfor i=1:10^6 “ loop. I was thinking about splitting my computation, i.e. vectorising batches of 10^4 simulations (instead of a single 10^6) and then running a parfor loop 10^2 times. Would this method be ok or would it lead to poorer results (I know it's hard to guess without trying, which is something I haven't done yet, but I need a guess to know if I'm going wrong way) ? What could be an efficient solution ?
FYI, I'm running simulations on a test computer, 4 cores, utilization at around 75% of each core when running.

3 Comments

Do you have GPU card?
No. At least, not on my test machine. I don't know yet if I'll have one. Does it change a lot?
Yes..it will if you have one deserving..read about gpuDevice, gpuArrays etc.

Sign in to comment.

 Accepted Answer

Joss Knight
Joss Knight on 30 Sep 2018
You seem to have the right idea. For highly vectorized code, parallelization should be done in batches. However, it does depend on what you are doing as to whether you will gain anything using a parallel pool. Many matrix operations in MATLAB are already heavily multi-threaded. Use the system tools to determine whether any of your CPU cores are idle and so available.

12 Comments

I contacted Matlab IT support, and showed them the consumption. They told me there was some room but not that much. My tests are done on a standard machine, I'll have a powerful one dedicated as a server machine with GPUs on it. I'll have to make modifications while using parallel computing toolbox anyway in the near future.
One last question.
At the moment, with my CPU I’m generating a set of say 1M random numbers using rng(0), one time.
How – in the context of GPU computing – are my simulations going to be different if I’m generating only one random number (for each simulation so all in all 1M random numbers) and the seed is set to 0 ? This will always be the same result (the random number used will always be the first value of say rand(1), compared to a set of 1M different reproducible values when vectorised).
What's the trick ?
Hmm. Not sure. I would probably seed each batch with a different seed, perhaps by defining a set of seeds for each batch up front. It does depend on whether you're using parfor, parfeval, spmd or some other construct.
cedric W
cedric W on 8 Oct 2018
Edited: cedric W on 8 Oct 2018
The batch will consist of 1M independant simulations. Setting a seed list for each simulation is not convenient I think.
Is it impossible to send a list / an array of random numbers upfront, to be used in the GPU ? I'm planning to use parfor at a GPU level (to dispatch on several GPUs), and arrayfun as an input to dispatch simulations on every GPU cores.
Edit: As an information, I used the same syntax as specified here: https://blogs.mathworks.com/loren/2013/06/24/running-monte-carlo-simulations-on-multiple-gpus/
If you use parfor then you're find to generate random numbers inside the loop with a single seed outside the loop, they'll always be different, since that's the way parfor is designed.
I wasn't saying generated one seed per simulation, I was saying generate once seed per batch of simulations. Then use it to generate your 1M starting values.
I'm not sure I understand correctly.
My code looks like this:
...
parfor ix = 1:nGPUs
S(ix,:,:,:) = runSimulationOnOneGPU(...);
end
...
function S = runSimulationOnOneGPU(...)
rng(0);
x=log(S0).*gpuArray.ones(NbAssets,1);
MCPaths= arrayfun( @Heston_One_Simulation,x,...);
S = gather(MCPaths);
end
function [S] = Heston_One_Simulation(x,...)
U_V = rand(NbAssets,1);
DrawNormalLaw=randn(NbAssets,1);
...
end
The "1" in rand and randn is here to specify that it's in 1D, not in the vectorized way. My goal is to have something like this:
Say NbAssets=1. On a vectorized way the code would have been DrawNormalLaw=randn(1,M)
As an example, if M=5 and seed=0 then I have DrawNormalLaw=rand(1,5)=[0.8147 0.9058 0.1270 0.9134 0.6324]. Therefore, for each simulation a different value would be used.
What I meant, is that if I set the seed to 0 in function runSimulationOnOneGPU, then DrawNormalLaw in Heston_One_Simulation will always be equal to 0.8147 for each simulation, leading to 1M identical simulations.
And therefore I don't understand your point. Could you please advise ? My goal is to be able to reproduce MC runs when restarting the algorithm. The idea to me would be to generate 1M upfront couples (rand,randn) values and then tell each GPU run to take at each simulation a different couple.
Call rng(0) before you start the parfor loop. Then you'll be fine.
Ok thank you. However I keep rand/randn inside the function Heston_One_Simulation right ?
Ah, no, it's not true, I'm wrong. The seed you set on the client has no effect on the seeds for the random number streams used on each worker. This Example shows how to get the desired behaviour from parfor.
cedric W
cedric W on 11 Oct 2018
Edited: cedric W on 11 Oct 2018
This setup, from what I understood, is going to generate the same simulations on each worker, and this is not the goal.
The aim is to have a run on a worker with twister' algo in example, and a parallel run with a different algo.
Do you agree this is not solving the issue ? Am I also limited to 3 possible algorithms on the GPU as stated here ? https://fr.mathworks.com/help/distcomp/control-random-number-streams.html
No, it'll generate the same set of numbers in a particular iterate each time you run your code, which may be a different worker because of the way parfor scheduling works, but appears to be what you want.

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel Computing Toolbox in Help Center and File Exchange

Asked:

on 25 Sep 2018

Commented:

on 11 Oct 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!