Saving large variables with save -v7 instead of -v7.3

386 views (last 30 days)
Question: Is there a better way to save large variables?
The problem of saving large variables is a rapidly growing problem. In MATLAB, the save function has the option of saving as v7.3, which uses HDF5 to allow the storage of variables larger than 2 GB. However, saving and loading take much longer with this option than with v7 (the default). I know I am not the only one with this problem - I found several other very similar questions (here and here, and more that I couldn't find again quickly).
I have a poor-practice solution, and I would like to know if there is a more elegant solution out there. My code is below, and here is my description. I want to save a cell array of large matricies. Each matrix is below the 2 GB limit, but the cell array is not. I want to make a cell array of handles instead, which would be very small but still permit indexing of the entire set of data. I have done that by making a cell array of variable names and using the infamous eval function to use the character vectors as handles.
My experiment produces data named "img01", "img02", ... Each image is about 300 MB. I would like to have a cell array with temperature, pressure, other variables for the dimensions, and each cell is a handle to the "img" file taken under those conditions. It would also be nice to load the workspace in a reasonable amount of time, i.e. not use -v7.3.
var1 = rand(1e4, 1e4); % 0.8 GB
var2 = rand(1e4, 1e4);
var3 = rand(2e4, 1e4); % 1.6 GB
var4 = rand(1e4, 2e4);
var5 = rand(1e4, 2e4);
data_all = {var1, var2, var3, var4, var5}; % 6.4 GB
data_handles = {'var1', 'var2', 'var3', 'var4', 'var5'};
%%% Check variable size
% whos
slice_data_all = data_all{2}(:,3);
var_data_handles = eval(data_handles{2});
slice_data_handles = var_data_handles(:,3);
%%% Check that the slices are the same
% sum(slice_data_all == slice_data_handles, 'all')
tic
save('SingleLargeVariable.mat', 'data_all', '-v7.3');
toc
tic
save('ManyVariables.mat', 'data_handles', 'var1', 'var2', 'var3', 'var4', 'var5', '-v7');
toc
Elapsed time is 144.252925 seconds.
Elapsed time is 105.970389 seconds.
In this example, saving with -v7 didn't actually save that much time, but I believe it saves much more when the variables are more compressible. I couldn't figure out how to test this easily. My actual data has many repeated values.
Another option would be to save the data in a binary file instead of a MAT file. Is this a good direction to go? I don't know much about this.
Another option is to save the individual variables, as in my example, but instead of the cell array of character vectors, save a string with the command to package all the variables back into a cell array. However, this seems likely to take more time than it saves by avoiding v7.3.
I am using MATLAB 2019b.
  2 Comments
David
David on 20 Mar 2020
Thanks Rik, that looks interesting. Looks like ImageDataStore and datastore were designed to help with RAM problems, but it might help with data storage as well.
It looks like for my purposes, the fastest solution is to run the code that generated my variables every time I want to load that data. It is faster to read the raw data and redo the analysis than it is to open the MAT file with the data analyzed.

Sign in to comment.

Answers (2)

Moe_2015
Moe_2015 on 20 Mar 2020
Hi David,
In 2019b you can actually choose to use the v7.3 without any compression. If storage is not an issue, you can add the -nocompression flag to your input to save and that should make loading the file faster than just the v7.3 with compression.
save('SingleLargeVariable.mat', 'data_all', '-v7.3','-nocompression');
  4 Comments
Moe_2015
Moe_2015 on 27 Mar 2020
Edited: Moe_2015 on 27 Mar 2020
Actually for the -v7.3 option, -nocompression was introduced in R2017a according to the release notes. But for -v7, yes it was R2019b. Either way though, it won't help out Bob, sorry.
Rik
Rik on 27 Mar 2020
Good catch, thanks for the correction. I guess I need to read more carefully next time.

Sign in to comment.


Walter Roberson
Walter Roberson on 20 Mar 2020
Sometimes it can be worth using a serialization tool and fwrite() the results to a binary file. For long term storage, the resulting file could potentially be compressed afterwards.

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!