Code for saving files as .mat is very very slow

Hi,
I am loading multiple .csv files into matlab and re-saving them as .mat files. The code takes way too long (hours) to save each file as a .mat file. Code is below:
myfiles = dir('*.txt') ;
p = length(myfiles) ;
for i = 1:p
thisfile = myfiles(i).name ;
y = importdata(thisfile) ;
matfile = strcat(thisfile,'.mat')
save(matfile,'y') ;
clearvars y
end
Any hint or help?
Thank you

10 Comments

Are these many tiny files on a hard drive? For small files the file system overhead is relatively large per file, and hard drives are fairly slow compared to SSDs.
In other words: how did you determine the save function is to blame? And why are you clearing a variable that you overwrite anyway?
For small number of files (say 100), it takes a few seconds to save each file as .mat file. If the number of csv files increases to say 1000, its takes much much longer
Did you use the profiler? Did you check your task manager (called resource monitor on Mac and Ubuntu if I recall correctly) to see if the drive is working hard?
It takes lots of memory.
What is "it"?
What is typical value for p?
As another said, eliminate the clearvars line; it does nothing.
How large are the .csv files?
What is the content of the .csv files? The bottleneck just might be in importdata instead.
Thanks for your response. Each file is about 40KB and they contain 3 headers and numeric data
If they are all the same structure, then if I recall correctly, importdata() can be used to generate code, and the generate code should be faster than importdata by itself.
If you have a header, then is importdata() returning a struct with one field per column? Have you tried save -struct ? Especially if you are defaulting to -v7.3, saving compound variables such as struct is slower than numeric variables.
Yes I get a struct like you described. How would I incorporate save struct into the code? will this make it faster?
"it takes a few seconds to save each file as .mat file. If the number of csv files increases to say 1000, its takes much much longer"
Yes, overall time, but have you profiled to prove it's actually the save operation that's the culprit?
Which OS? Olden FAT32 days large numbers of files in a subdirectory would really bog things down; NTFS is better in that regard but I don't know whether the problem really goes away--or is mostly just not observed by not running things that emphasize the problems that might be.
I wonder if there's any chance it could have anything to do with acessing/releasing system resources like file handles, etc..so one ends up with the slowness owing to waits inside the system calls in the i/o routines?
All just conjecture...

Sign in to comment.

Answers (0)

Categories

Products

Release

R2019b

Tags

Asked:

on 30 Jun 2020

Commented:

on 1 Jul 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!