Can I tell Matlab not to use contiguous memory?

19 views (last 30 days)
Matlab eats enormous amounts of memory and rarely or never releases it. As far as I can tell from past questions about this, it's because Matlab stores variables in contiguous physical memory blocks. As the heap becomes fragmented, Matlab asks for new memory when it wants to allocate a variable larger than the largest available contiguous fragment. MathWorks' recommended solution is "exit Matlab and restart". The only defragmentation option at present is "pack", which saves everything to disk, releases everything, and reloads variables from disk. This a) takes a while and b) only saves variables that are 2 GB or smaller.
Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?
The only reasons I can think of for asking for contiguous physical memory would be to reduce the number of TLB misses (for code running on CPUs) or to allow hardware acceleration from peripherals that work using DMA (such as GPUs) and that don't support mapping fragments. Like most of the other people who were complaining about this issue, I'd rather have TLB misses and give up GPU acceleration for my tasks and not run out of memory. I understand that for large problems on large machines, these features are important, but it's very strange that there's no way to turn it off.
(Even for our big machines, RAM is more scarce than CPU power or time, so "throw more RAM in the machine" is not viable past the first quarter-terabyte or so.)
Edit: Since there is apparently confusion about what I'm asking:
  • Matlab certainly stores variables in contiguous virtual memory. As far as user-mode code is concerned, an array is stored as an unbroken block.
  • Normal user-mode memory allocation does not guarantee contiguous physical memory. Pages that are mapped to contiguous virtual addresses may be scattered anywhere in RAM (or on disk).
  • Previous forum threads about Matlab's memory usage got responses stating that Matlab does ask for contiguous physical memory, requiring variables to be stored as unbroken blocks in physical RAM (pages that are adjacent in virtual address space also being adjacent in physical address space).
  • The claim in those past threads was that Matlab's requirement for contiguous physical memory was responsible for its enormous memory use under certain conditions.
  • If that is indeed the case, I wanted to know if there was a way to turn that allocation requirement off.
As of 02 April 2021, I've gotten conflicting responses about whether Matlab does this at all, and have been told that if it does do it there's no way to turn it off and/or that turning it off would do horrible things to performance. I am no longer sure that these responses were to my actual question; hence the clarification.
Edit: As of 06 April 2021, consensus appears to be that Matlab does not ask for contiguous physical memory, making this question moot.
  8 Comments
Bruno Luong
Bruno Luong on 4 Apr 2021
So after many elaboration, the question now becomes "can I tell MATALB to use contiguous (physical) memory?".
Christopher Thomas
Christopher Thomas on 5 Apr 2021
If Matlab is not presently asking for contiguous physical memory, I'm not particularly interested in getting it to do so; that would be a topic best moved to its own forum post.
I took a look at Linux's kernel functions for memory management last week (as the last time I had to manage physical memory was quite a few years ago). Long story short, it could be done but would be ugly. The situation is probably similar for other OSs. Further details are beyond the scope of this thread.

Sign in to comment.

Accepted Answer

Christopher Thomas
Christopher Thomas on 6 Apr 2021
Consensus appears to be that Matlab does not ask for contiguous physical memory, making this question moot.

More Answers (4)

Walter Roberson
Walter Roberson on 27 Mar 2021
You are mistaken.
>> clearvars
>> pack
>> foo = ones(1,2^35,'uint8');
>> clearvars
I allocated 32 gigabytes of memory on my 32 gigabyte Mac, it took up physical memory and virtual memory, and when I cleared the variable, MATLAB returned the memory to the operating system.
There has been proof posted in the past (but it might be difficult to locate in the mass of postings) that MATLAB returns physical memory for MS Windows.
I do not have information about Linux memory use at the moment.
MATLAB has two memory pools: the small object pool and the large object pool. I do not recall the upper limit on the small object pool at the moment; I think it is 512 bytes. Scalars get recycled a lot in MATLAB.
Historically, it was at least documented (possibly in blogs) that MATLAB did keep hold of all memory it allocated, and so could end up with fragmented memory. But I have never seen any evidence that that was a physical memory effect: you get exactly the same problem if you ask the operating system for memory and it pulls together a bunch of different physical banks and provides you with the physical memory bundled up as consecutive virtual addresses.
At some point, evidence started accumulating that at least on Windows, at least for larger objects, MATLAB was using per-object virtual memory, and returning the entire object when it was done with it, instead of keeping it in a pool. I have not seen anything from Mathworks describing the circumstances under which objects are returned directly to the operating system instead of being kept for the memory pools.
Side note: I have proven that allocation of zeros is treated differently in MATLAB. I have been able to allocate large arrays, and then when I change a single element of the array, been told that the array is too large.
  12 Comments
Bruno Luong
Bruno Luong on 5 Apr 2021
Edited: Bruno Luong on 5 Apr 2021
"Not inplace change, no: there are places now where if you index an array, then instead of making a copy of the desired section, that MATLAB instead creates a new header that points "inside" the existing data."
I think we speak about the same thing: I call this "inplace" in the sense that data (RHS) is inplace.
II have submitted such package in the pass on FEX, it works with some older versions then it is broken since MATLAB probihits users for doing that and constantly they change their data management since. I stopped trying to follow them, and I must admit that I can't even follow them for lack of published information that I need to make such pakage work reliably.
But now it is integrated in the engine, such package is no longer relevant.
Christopher Thomas
Christopher Thomas on 5 Apr 2021
Regarding the original problem being encountered, the issue was that several of the people at our lab were finding that Matlab would ask for far more memory than needed for the variables we thought we were storing, and that this was causing problems with our compute server (it's not handling hitting swap as gracefully as it should; that's its own problem).
I spent a bit of time forum-searching to see why this sort of thing happened and what to do about it, and forum consensus seemed to be what I presented in my original post (that Matlab was asking for contiguous physical memory and having serious problems if the heap became fragmented as a result). If that was the case, there seemed to be a straightforward solution, which I asked about.
If those original forum posts were in error, then I'm back to square one figuring out the conditions under which this occurs and how to prevent it.

Sign in to comment.


Joss Knight
Joss Knight on 27 Mar 2021
You might want to ask yourself why you need so many variables in your workspace at once and whether you couldn't make better use of the file system and of functions and classes to tidy up temporaries. If you actively need your variables, it's probably because you're using them, which means they're going to need to be stored in contiguous address space for any algorithm to operate on them efficiently. If it's data you're processing sequentially, consider storing as files and using one of MATLAB's file iterators (datastore) or a tall array. If it's results you're accumulating, consider writing to a file or using datastore or tall's write capabilities.
Memory storage efficiency is ultimately the job of the operating system, not applications. If you want to store large arrays as a single variable but in a variety of physical memory locations, talk to your OS provider about that. They in turn are bound by the physics and geography of the hardware.
  16 Comments
Bruno Luong
Bruno Luong on 6 Apr 2021
I have work more than 20 years with MATLAB with small, large simulation from simple tasks but long simulation (that last weeks to months) to complex tasks (simulation a full autonomous twin robots working during day).
AFAIR I never seen memory increases foreever due to memory fragmentation.
In my various expexriences, once the simulation is going, the memory state quickly stabilize.
If it doesn' stabilize than your simulations must do something constantly different over time in the computer memory.
Anyway up to you to stick with your assumption.
Christopher Thomas
Christopher Thomas on 6 Apr 2021
I've been using Matlab since 2003 and have been coding much longer than that. I acknowledge your expertise, but it's pretty obvious that our tasks have different memory access patterns.
For my tasks and one other user's tasks, intermediate results are frequently allocated and de-allocated. These intermediate results are not necessarily the same size (it depends on the data). If there's a situation where subsequent buffers are allocated that are larger than their predecessors, that's exactly the type of scenario where heap fragmentation might occur (buffer of size N is allocated, smaller objects are allocated that are placed after it, buffer of size N is released freeing up a slot of size N, buffer of size N+K is allocated, won't fit into that slot, and is placed at the end of the heap instead).
I haven't looked at the third user's code, so I can't comment on whether this specific case is driving their memory usage or not.
Yes, the code could be rewritten to change those allocation patterns. There is a different, simpler rewrite that addresses the problem in a different way that I've already suggested to the other user instead. My priority at this point is finding the solutions that involve the smallest programmer time investment, as that is the most scarce resource in our lab at the moment (we're not flush with money either, but time is still harder to come by).

Sign in to comment.


Jan
Jan on 24 Mar 2021
Edited: Jan on 24 Mar 2021
Is there any way to tell Matlab (via startup switch or other option) to allow variables to be stored in fragmented physical memory?
No, there is absolutely no chance to impelement this. All underlying library functions expect the data to be represented as contiguous blocks.
If you need a distributed representation of memory, e.g. storing matrices as list of vectors, you have to develop the operations from scratch. This has severe drawbacks, e.g. processing a row vector is rather expensive (when the matrix is stored as column vectors). But of course it works. You only have to relinquish e.g. BLAS and LAPACK routines.
I've written my first ODE integrator with 1 kB RAM. Therefore I know, how lean the first 250 GB of RAM are. But the rule remains the same: Large problems need large machines.
  7 Comments
Christopher Thomas
Christopher Thomas on 30 Mar 2021
Per my post accidentally attached to another user's response, Matlab certainly does use swap. Feel free to run the sample program I attached to test this (if you have access to a Linux system; Matlab doesn't seem have an OS-independent way to check memory use).
The same test program will demonstrate copy-on-write behavior. Toggle the "modify the input argument" flag "true" or "false" to observe this.
Jan
Jan on 2 Apr 2021
I've clarified my mistake in my former comment: Matlab does use the pagefile under Windows.
Your text code shows, that Matlab uses a copy-on-write method. This is documented. Your assumption, that this is done automatically using exceptions for write-protected virtual pages, does not match the implementation in Matlab.

Sign in to comment.


Steven Lord
Steven Lord on 25 Mar 2021
You know, I really want to read the newest Brandon Sanderson novel. But I don't have room on that shelf in my bookshelf for the volume. Let me store pages 1 through 20 on this bookshelf upstairs. Pages 21 through 40 would fit nicely in that little table downstairs. Pages 41 through 50 can squeeze into the drawer on my nightstand upstairs. Pages 51 through 80 could get stacked on top of the cereal in the kitchen cupboard downstairs. Pages ...
That's going to make it a lot more time consuming to read the new book. And since Sanderson's newest book is over 1200 pages long, I'm going to wear a path in the carpet on the stairs before I'm finished.
So no, there is no setting to tell MATLAB not to use contiguous memory.
The bits may be physically located in different locations on the chip, but to MATLAB and to the libraries we use they have to appear contiguous. Since in actuality I'm more likely to read that Sanderson novel on my tablet, the pages could be stored as a file split across different locations in the physical hardware of the SD card in my tablet but the reader software handles it so I see the pages in order and I would be annoyed if I had to manually switch to reading a different book every chapter to match the physical location of the data.
  6 Comments
John D'Errico
John D'Errico on 27 Mar 2021
Remember that knowing those elements are stored contiguously in memory is a hugely important feature, and making them stored in those memory locations improves the way the BLAS works. And that is a big feature in terms of speed. So while a few people MIGHT want to have a feature that would slow down MATLAB for everybody else, I doubt most users would be happy to know that because one person thinks it important, suddenly a matrix multiply is now significantly slower.
It won't happen, nor would I and a lot of other people be happy if it did.
Christopher Thomas
Christopher Thomas on 29 Mar 2021
Regarding which rationales were demonstrably mistaken:
  • The claim that Matlab's computation routines must use single contiguous physical memory segments to store arrays (as opposed to putting pages wherever there's space). The only thing that actually needs this is DMA from devices that don't have access to the page table.
  • The claim that Matlab uses physical addressing rather than virtual addressing. Matlab uses copy-on-write and swapping, and runs as user-space code. Direct access to physical memory requires privileged instructions because it bypasses memory protection (which is implemented by the page table).
  • The claim that the OS's heap manager is in any way to blame for this (from previous form threads). It's doing exactly what you're asking it to.
  • The claim that I need a machine with more memory. If my dataset is very much smaller than physical memory and Matlab is asking for very much more space than I have physical memory, the machine isn't the problem.
Regarding "elements being stored contiguously in memory being a hugely important feature", all that's needed is that they be contiguous in virtual memory, which makes them contiguous within pages. Remember that page size (2 MB last I checked; 4 kB on very old systems) is very much larger than cache row size (typically anywhere from 32 bytes to 256 bytes). Virtually all of your cache hits and misses will happen the same way whether pages are contiguous with each other or fragmented. Moving to a different page requires a new page table lookup whether that page is contiguous to the old one or not, so it's not obvious to me that making pages contiguous with each other helps cache performance at all.
Per my original post - all I'm looking for is that there be the option to store pages in fragmented memory rather than contiguously with each other. The computing library implementation is unchanged (it literally can't tell the difference); the only changes you'd need to make are to switch to different memory allocation calls when the "allow fragmented memory" flag is set, and to lock out hardware accelerators like GPUs when the flag is set.
This in no way "slows down MATLAB for everyone else". The entire point of a flag is that it's something you have to set. By default memory allocation would still be physically contiguous.

Sign in to comment.

Categories

Find more on Scope Variables and Generate Names in Help Center and File Exchange

Tags

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!