preallocate array without initializing

Is there any way to preallocate a matrix without initializing it to either NaN's, zeros or ones?
I'm working with large image data, typically 1-4 billion pixels, and need to preallocate the array so I can read the data in in chunks. I don't need the array initialized because I will either fill the entire array if reading the data is successful, or throw an exception if it fails.
Preallocating the array with either zeros or NaN's takes matlab several seconds to initialize the array.
Allocating a large array like this in C++ returns immediately because it neither new(...) or malloc(...) bother to initialize the memory.
Is it possible to allocate the array either in a C++ mex file or using coder.ceval(...) to avoid the initialization time?

 Accepted Answer

If you want a matrix of class int16 filled with 0 elements, don't wrap a call to zeros in a call to int16. Instead use the typename input listed on the documentation page for the zeros function.
z = zeros(1e4, 1e5, 'int16');
Are you trying to read in a sequence of images that may take up more memory than you have avaialble (or a significant fraction of your available memory?) If so consider creating a tall array from an ImageDatastore and performing your analysis on the tall array. See this documentation page for more information on tall and datastore arrays.

8 Comments

This seems the best solution. Didn't know that zeros(...) had the additional parameter...should have read the doc pages for zeros(...). doh.
>> clear z; tic(); z=zeros(10000,100000,'int16'); toc()
Elapsed time is 0.000278 seconds.
It does initialize memory to all zeros, but does so as quickly as the Uninit() function mentioned by BL.
Thanks for the suggestion on tall arrays and the link.
FWIW am actually working with global elevation maps for earth and mars, both of which are many GB in size. They do fit in memory on our workstations, but good to know about alternatives for when the resolution doubles...
James Tursa
James Tursa on 1 Dec 2018
Edited: James Tursa on 2 Dec 2018
I would add a word of caution on this approach. It appears (to me anyway) that MATLAB keeps a store of 0-filled memory in the background for use in cases such as this. In fact there is an unofficial mex routine mxFastZeros that seems to tap into this. I haven't done extensive testing with this, but I have noted in past limited testing that this 0-filled memory can seem to be exhausted in certain circumstances and then subsequent calls to functions that used to be very fast suddenly become much slower due to the 0-filling that is now incurred by the function.
Also, if you use the coder with the zeros( ) function and look at the resulting C-code, my guess is you will see a for-loop that manually fills in the 0's. So your C-code will not benefit from the MATLAB optimization of using pre 0-filled memory (or whatever that optimization really is in the background).
Ian
Ian on 1 Dec 2018
Edited: Ian on 1 Dec 2018
Hmmm. curious. thanks for your comment.
I can see that being an effective strategy -- preallocate an area of zeros that can be handed off to the user; and create a new block of zeros in the background with some low-level "rep stos..." instructions to be ready for the next call to zeros(...). IIRC, most compilers will implement either C++ memset(...) calls or a loop initializing memory with such a single repetetive pipelined instruction for speed. This could be adaptive to prepare blocks similar in size to a user's recent calls to zeros(...), which would occasionally be a bottleneck if the size changed suddenly.
A little googling for benchmarks suggests modern intel processors seem to be able to clear anywhere from 2 to 8 GB/sec using "rep stos...", but that doesn't match the times I got in my tests above.
Some systems take advantage of hardware to zero newly allocated memory. It is not uncommon for there to be a hardware level "allocate and zero page" that uses chip lines to zero an entire block of memory at a time.
Ah. indeed. I wondered if that was a possibility these days. hardware has undoubtedly improved since my college days...
"Demand Zero Paging" (sometimes called "zero-on-demand") has existed since Vax VMS days, if not earlier.
Ian
Ian on 2 Dec 2018
Edited: Ian on 2 Dec 2018
Which only shows how gray my beard is! Thanks for your comment. This was not a hardware feature of either the memory or the processors in the Z-80 and intel 8080 based systems I cut my teeth on.
For anyone like me both clueless about and interested in what "Demand-Zero-Paging" is and why it is important, here's a good brief description, in answer to a question about the concept:
If I understand it correctly, a request for a large block of zero'd memory can be memory-mapped to a single small segment of physical memory, and only redirected to a larger physical address space when actually written to.
That would explain why a call to a large quantity of zeros(...) is much faster than a call for nan(...) or ones(...).
Some history . I have not found anything definitive as to when Demand Zero was introduced . Demand paging dates to 1961.
Demand Zero requires MMU which were not present in the intel 80* series until the 80286
https://sigops.org/sosp/sosp15/history/04-satya-slides.pdf
https://homes.cs.washington.edu/~arvind/cs422/lectureNotes/l16-2.pdf

Sign in to comment.

More Answers (4)

Bruno Luong
Bruno Luong on 30 Nov 2018
Edited: Bruno Luong on 30 Nov 2018
MEX can do that, use for example this utility
per isakson
per isakson on 30 Nov 2018
Edited: per isakson on 30 Nov 2018
Try this
>> A(100,100) = 0;
>> whos A
Name Size Bytes Class Attributes
A 100x100 80000 double
I believe it is described by UndocumentedMatlab, but don't find it now. However, see the two first comments to the blog piece, which I linked to.
Maybe, more relevant
>> B(100,100)=uint8(0);
>> whos B
Name Size Bytes Class Attributes
B 100x100 10000 uint8
A little experimenting has determined the following:
z = zeros(m,n);
is apparently quite fast, but only helpful if one needs a matrix of doubles.
clear z; tic(); z=zeros(10000,100000); toc()
Elapsed time is 0.000367 seconds.
>> whos z
Name Size Bytes Class Attributes
z 10000x100000 8000000000 double
>> clear z; tic(); z=int16(zeros(10000,100000)); toc()
Elapsed time is 5.164922 seconds.
>> whos z
Name Size Bytes Class Attributes
z 10000x100000 2000000000 int16
Unfortunately, converting the result to int16 is very slow and presumably uses about 5x final memory requirements in the process, which if the array is large, will cause problems.
PI's approach probably avoids creating the matrix first as doubles, but is also slow (but faster than the above):
>> clear z; tic(); z(10000,100000) = int16(0); toc()
Elapsed time is 1.023650 seconds.
>>
>> clear z; tic(); z(10000,100000) = 0; toc()
Elapsed time is 5.692720 seconds.
Ian
Ian on 30 Nov 2018
Edited: Ian on 30 Nov 2018
BL is apparently correct -- it can be done in a MEX file -- and I have located a functional solution on Matlab File Eschange: 31362-uninit-create-an-uninitialized-variable-like-zeros-but-faster
It is a self-compiling MEX file which allows creation of matrices of any data type without initializing them.
clear z; tic(); tic(); z=uninit(23040,46080,'int16'); toc()
Elapsed time is 0.000231 seconds.
This solution is old (last updated 2011), but works in R2018a on MacOS and on Linux under R2017b.
It does leave the resulting matrix uninitialized.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Asked:

Ian
on 30 Nov 2018

Commented:

on 2 Dec 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!