Gather Tall array Error - No Workers are available for Queue excution

5 views (last 30 days)
Hi everyone,
I hope you all doing well!
I need help to fix an issue that I am facing with gather function, or maybe a workaround. Let me describe it further:
  1. I am working on a sound classification problem and decided to start with this MathWorks example Acoustic Scene Recognition Using Late Fusion
  2. I have not done any modifications yet, just trying to reproduce same results, however I am getting an error saying ( Error using parallel.internal.bigdata.ParallelPoolBroadcastMap>iRunOnAll Internal problem while evaluating tall expression. The problem was One or more futures resulted in an error.) More details about the error can be found in the attached file.
  3. I did an extensive debugging to understand why gathering tall array resluted in such an error, and as conclusion I suspect that it has something to do with out of memory ... but still not sure as I am quite new to tall funciton and functions that apply to it such as gather.
  4. I set speedupExample to true, and I was able to run the example without any issue, however the validation and test accuracy is bad and that 's because of the small amount of data that I used for the training
speedupExample = true;
if speedupExample
adsTrain = splitEachLabel(adsTrain,20);
adsTest = splitEachLabel(adsTest,10);
end
Your help is highly appreciated .
Many thanks,
Abderrahim

Accepted Answer

Walter Roberson
Walter Roberson on 21 Oct 2023
In some discussion not long ago, some people including some Mathworkers were talking about what happens when an error is detected on the workers. In at least some cases, Parallel Computing Toolbox removes the erroring worker from being able to execute jobs, reducing the number of workers (tasks not yet done that were assigned to the worker get requeued.)
In cases where having multiple workers is leading to too much total memory being requested to be able to process properly, killing off workers reduces the total amount of memory in use simultaneously. If any single worker only needs an amount of memory that is acceptable but multiple workers together is too much, then this acts to prune down the number of workers until the total memory used between what is left fits into what can be handled.
But if every worker errors (for example if they all individually need more memory than the system can supply), then you might be left with no workers left to process the queue at all.
  5 Comments
Abderrahim. B
Abderrahim. B on 21 Oct 2023
So was able to fix the worker issue, but then ran into out of memory issue.
How I workaround the first issue?
I started the parallel loop programmatically as below:
parpool("Processes", 14, "SpmdEnabled",false)
Any tips&Tricks how to workaround out of memory issue !
Thanks
Abderrahim. B
Abderrahim. B on 26 Oct 2023
Just want to share that I have solved the memory issue as well. Below how I did it:
  • Increased MATLAB Workspace Memory
  • Increased Java Heap Memory

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!