How to efficiently integrate big data without using memory / (How to create big data)
9 views (last 30 days)
Show older comments
- in a study i will produce large arrays.
- Each array will have at least 500 MB size.
- Each array will have the same number of rows.
- the total size of dataset will be approximately 20 GB or over.
- Somehow I have to create a single variable/array which includes all data and size of 20 GB.
matfile seems a good solution. However when the size of file increases, it gets slower. How can i handle this problem?
9 Comments
Walter Roberson
on 18 Aug 2015
I wonder if compression is leading to slowdowns? I do not know whether -v7.3 with matfile uses compression; see discussion http://www.mathworks.com/matlabcentral/answers/15521-matlab-function-save-and-v7-3 and http://www.mathworks.com/matlabcentral/answers/137592-compress-only-selected-variables-when-saving-to-mat
Accepted Answer
JMP Phillips
on 19 Aug 2015
Edited: Walter Roberson
on 19 Aug 2015
Here are some things you could try:
Use the matfile function, which allows you to access and change variables directly in MAT-files, without loading into memory: http://au.mathworks.com/help/matlab/large-mat-files.html http://au.mathworks.com/help/matlab/ref/matfile.html
Structure your data differently: - if you are representing the data as doubles, maybe you can afford less accuracy e.g. use int32. For example, you can use scaling of 1e4 to represent a double value such as 100.3425 as an integer 1003425.
With MATLAB:
- use 64 bit matlab version
- try disabling compression when saving the files, with the -v6 option
Optimize your PC for your task:
- in task manager, close any unnecessary processes running at the same time, including taskbar junk (adobe update, java update etc)
- disable your anti-virus which might be trying to scan the file and slowing it down
- under task manager, give higher priority to the MATLAB process (see http://www.sevenforums.com/tutorials/83361-priority-level-set-applications-processes.html)
- increase your virtual memory or page file size http://windows.microsoft.com/en-au/windows/change-virtual-memory-size#1TC=windows-7
- defragment your hard drive
- run MATLAB from your local hard drive and not a network drive or external harddrive
- save the .mat file to your local hard drive where it has plenty of space, not a network drive or external harddrive.
- For faster hard drive access, use a Solid State Drive (SSD)
2 Comments
Walter Roberson
on 19 Aug 2015
The -v6 option is incompatible with matfile and with objects over 2 Gb.
More Answers (0)
See Also
Categories
Find more on Standard File Formats in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!