Code for saving files as .mat is very very slow
Show older comments
Hi,
I am loading multiple .csv files into matlab and re-saving them as .mat files. The code takes way too long (hours) to save each file as a .mat file. Code is below:
myfiles = dir('*.txt') ;
p = length(myfiles) ;
for i = 1:p
thisfile = myfiles(i).name ;
y = importdata(thisfile) ;
matfile = strcat(thisfile,'.mat')
save(matfile,'y') ;
clearvars y
end
Any hint or help?
Thank you
10 Comments
Rik
on 30 Jun 2020
Are these many tiny files on a hard drive? For small files the file system overhead is relatively large per file, and hard drives are fairly slow compared to SSDs.
In other words: how did you determine the save function is to blame? And why are you clearing a variable that you overwrite anyway?
Curious Mind
on 30 Jun 2020
Rik
on 30 Jun 2020
Did you use the profiler? Did you check your task manager (called resource monitor on Mac and Ubuntu if I recall correctly) to see if the drive is working hard?
Curious Mind
on 30 Jun 2020
dpb
on 30 Jun 2020
What is "it"?
What is typical value for p?
As another said, eliminate the clearvars line; it does nothing.
How large are the .csv files?
What is the content of the .csv files? The bottleneck just might be in importdata instead.
Curious Mind
on 30 Jun 2020
Walter Roberson
on 30 Jun 2020
If they are all the same structure, then if I recall correctly, importdata() can be used to generate code, and the generate code should be faster than importdata by itself.
If you have a header, then is importdata() returning a struct with one field per column? Have you tried save -struct ? Especially if you are defaulting to -v7.3, saving compound variables such as struct is slower than numeric variables.
Curious Mind
on 30 Jun 2020
dpb
on 30 Jun 2020
"it takes a few seconds to save each file as .mat file. If the number of csv files increases to say 1000, its takes much much longer"
Yes, overall time, but have you profiled to prove it's actually the save operation that's the culprit?
Which OS? Olden FAT32 days large numbers of files in a subdirectory would really bog things down; NTFS is better in that regard but I don't know whether the problem really goes away--or is mostly just not observed by not running things that emphasize the problems that might be.
I wonder if there's any chance it could have anything to do with acessing/releasing system resources like file handles, etc..so one ends up with the slowness owing to waits inside the system calls in the i/o routines?
All just conjecture...
Walter Roberson
on 1 Jul 2020
save(matfile, '-struct', 'y') ;
Answers (0)
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!