MATLAB Answers

0

How to lock a variable in the main stack for parallel programming?

Asked by Paolo Bocchini on 10 Mar 2011
Hi, first I write an introduction to explain what I do. If you are not interested, jump to the question (in bold).
I am running a code in parallel that uses GA (but I guess with a simple parfor would be the same). Inside the loop, the code has to solve several times a problem with certain input variables. Since the input variables might happen to be the same as a previous run and the solution is computationally expensive, I build a lookup table along the execution. The code assigns a unique numeric identifier to each combination of input variables for which it has to solve the problem (I have about 1e40 combinations, so the identifier is made by several doubles), then it checks in the lookup table if the problem has already been solved for that combination input variables. If yes, it just retrieves the solution from the lookup table, otherwise it solves the problem and adds the new numeric identifier and the associated solution at the end of the lookup table. To make sure that GA don't mess up the table, the table is stored in the main stack, and retrieved by the fitness function every time. This works smoothly as long as I don't run in parallel. When I run in parallel, I have this issue: what happens if two CPUs of the pool try to update the lookup table in the stack simultaneously?
Can I have a CPU of the pool "lock" a variable in the main stack, update it, and then unlock it, to make sure that there are no conflicts with other CPUs that might try to update it simultaneously?
PS: don't say "preallocate the space for all your combinations": I can't, I have 1e40 combinations, I will actually use only some thousands of them, but I cannot know which ones in advance.

  3 Comments

Are you calling GA with the UseParallel option to achieve parallelism?
Actually, before the use of GA I do a sort of preliminary optimization that uses the same objective functions to generate the Initial Population. This is done through a parfor loop and I just realized that things don't work because in the DBStack inside the parfor loop there is no 'base' workspace. I don't know how to workaround this.

Sign in to comment.

Products

2 Answers

Answer by Jason Ross
on 10 Mar 2011

Hi Paulo,
I'm wondering if you have considered building your lookup table in an external database? Many aspects of your problem seem to be going in this direction:
  • Dealing effectively with concurrent writes. Databases have been optimized for doing this.
  • Querying your existing result set to see if you've already done it. Databases are good at this, too.
  • Moving part of the storage requirements of your problem onto another program / filesystem. I am assuming that as you scale, you will likely encounter memory utilization issues as your lookup table grows.
  • Assigning unique ID's to things (aka keys)
It seems that your logic is already built, you would just replace the search and update logic with database calls.
This kind of scheme would also let you expand more easily to more processes in the future. There would also be a side benefit that with your results stored externally, re-runs of the analysis would go faster in the event your processing was interrupted for one reason or another, as you would pick the stored results until you got to the ones you had not run.

  1 Comment

It seems really interesting. Unfortunately, I don't have the database toolbox.

Sign in to comment.


Answer by Walter Roberson
on 10 Mar 2011

I don't know how GA handles its parallelism.
If you were using pmode or SPMD (but not PARFOR) then the official method would involve using labSend to send the new value to one single lab to do the updating. It might also be necessary to use labBarrier before doing a retrieval.
The lab communication routines are not available in PARFOR.

  2 Comments

Thank you, I will look better into it!
Paolo
GA uses parfor for its parallelism.

Sign in to comment.