Saving large variables with save -v7 instead of -v7.3
386 views (last 30 days)
Show older comments
Question: Is there a better way to save large variables?
The problem of saving large variables is a rapidly growing problem. In MATLAB, the save function has the option of saving as v7.3, which uses HDF5 to allow the storage of variables larger than 2 GB. However, saving and loading take much longer with this option than with v7 (the default). I know I am not the only one with this problem - I found several other very similar questions (here and here, and more that I couldn't find again quickly).
I have a poor-practice solution, and I would like to know if there is a more elegant solution out there. My code is below, and here is my description. I want to save a cell array of large matricies. Each matrix is below the 2 GB limit, but the cell array is not. I want to make a cell array of handles instead, which would be very small but still permit indexing of the entire set of data. I have done that by making a cell array of variable names and using the infamous eval function to use the character vectors as handles.
My experiment produces data named "img01", "img02", ... Each image is about 300 MB. I would like to have a cell array with temperature, pressure, other variables for the dimensions, and each cell is a handle to the "img" file taken under those conditions. It would also be nice to load the workspace in a reasonable amount of time, i.e. not use -v7.3.
var1 = rand(1e4, 1e4); % 0.8 GB
var2 = rand(1e4, 1e4);
var3 = rand(2e4, 1e4); % 1.6 GB
var4 = rand(1e4, 2e4);
var5 = rand(1e4, 2e4);
data_all = {var1, var2, var3, var4, var5}; % 6.4 GB
data_handles = {'var1', 'var2', 'var3', 'var4', 'var5'};
%%% Check variable size
% whos
slice_data_all = data_all{2}(:,3);
var_data_handles = eval(data_handles{2});
slice_data_handles = var_data_handles(:,3);
%%% Check that the slices are the same
% sum(slice_data_all == slice_data_handles, 'all')
tic
save('SingleLargeVariable.mat', 'data_all', '-v7.3');
toc
tic
save('ManyVariables.mat', 'data_handles', 'var1', 'var2', 'var3', 'var4', 'var5', '-v7');
toc
Elapsed time is 144.252925 seconds.
Elapsed time is 105.970389 seconds.
In this example, saving with -v7 didn't actually save that much time, but I believe it saves much more when the variables are more compressible. I couldn't figure out how to test this easily. My actual data has many repeated values.
Another option would be to save the data in a binary file instead of a MAT file. Is this a good direction to go? I don't know much about this.
Another option is to save the individual variables, as in my example, but instead of the cell array of character vectors, save a string with the command to package all the variables back into a cell array. However, this seems likely to take more time than it saves by avoiding v7.3.
I am using MATLAB 2019b.
Answers (2)
Moe_2015
on 20 Mar 2020
Hi David,
In 2019b you can actually choose to use the v7.3 without any compression. If storage is not an issue, you can add the -nocompression flag to your input to save and that should make loading the file faster than just the v7.3 with compression.
save('SingleLargeVariable.mat', 'data_all', '-v7.3','-nocompression');
4 Comments
Rik
on 27 Mar 2020
Good catch, thanks for the correction. I guess I need to read more carefully next time.
Walter Roberson
on 20 Mar 2020
Sometimes it can be worth using a serialization tool and fwrite() the results to a binary file. For long term storage, the resulting file could potentially be compressed afterwards.
0 Comments
See Also
Categories
Find more on HDF5 in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!