MATLAB Answers

Memory Usage and Speed

21 views (last 30 days)
Jeff
Jeff on 10 Mar 2014
Commented: Jeff on 17 Mar 2014
Sorry for the double question, but the two are, I think, somewhat related. I am working on an application that loops through a set of files in a single folder. There are typically 16,000 to upwards of 55,000 files in the folder. They are always dicom files. The loop looks at each file and retrieves a small bit of information from each file and then saves that information to a structure which is eventually saved to a .mat file.
The app is running fairly slow, in my opinion. It is requiring approximately 35 seconds per 100 files which would come out to be about 320 minutes, or almost 6 hours, to run through just this section of the program. I think this is a bit too long - maybe not. But I'd like your opinion on this speed. Does this sound right? I'm working on a Windows 7 64-bit machine with 16 GB of RAM.
Secondly, I looked at the Task Manager and noticed that MatLab is using 300,000K of memory during the looping process. Is this normal?

  3 Comments

dpb
dpb on 10 Mar 2014
No way to answer the question as posed...much undoubtedly depends on how you've coded it. In general, if you're on Windows having huge numbers of files in subdirectories is a recipe for slowdown.
How are you processing the files? Did you collect the dirctory first and are you then iterating thru it or some other way?
Are you just accessing some of the metadata and using the minimal amount of data being read to do so instead of the full file?
I've never actually used DICOM so don't know much about it specifically.
The memory usage sounds like maybe you're loading a full image or have something else loaded, maybe???? There's a lot of overhead in structures, too, though, you might want to look at the memory usage of your data storage scheme after processing just a few to see whats getting consumed there.
Also, of course, run the profiler and see where the run time is actually being spent instead of guessing on that...
Jeff
Jeff on 10 Mar 2014
Thanks, dpb.
I realize my code will also factor into the equation. However, it doesn't seem like it should affect it this much. To answer your first question, the dicom files are already located in the folder. The loop starts by extracting a few items (e.g. mr number, file name, study/series number, and type of file i.e CT) from the dicom header? The data is stored in a 5 field structure during the loop. After all data has been parsed the structure is saved into temp.mat.
Most of the files are typically 512x512 CT images with header data.
dpb
dpb on 10 Mar 2014
Well, given Sean's timings, it seems likely that there's something to look at. Even an outline of the basic could let folks see if there's anything obvious being done that could be the killer.
Again, did you get the list from dir first and have you run the profiler yet? The question about local/network is a very good one I hadn't thought about...

Sign in to comment.

Answers (5)

Sean de Wolski
Sean de Wolski on 10 Mar 2014
Are you actually reading in the files using dicomread()? If you are, perhaps the quicker dicominfo() could provide what you're looking for (meta info) without having to load all of the data.
Otherwise, use the profiler to identify the bottle necks and post that here.
Without knowing what you're doing, it's a shot in the dark to know if the timings are reasonable

  0 Comments

Sign in to comment.


Jeff
Jeff on 10 Mar 2014
Sean,
I am using dicominfo() to extract the header data.

  2 Comments

Sean de Wolski
Sean de Wolski on 10 Mar 2014
I just ran dicominfo over 60 files and it took 1.3s. Wimpy windows7x64 laptop.
Sean de Wolski
Sean de Wolski on 10 Mar 2014
Are you preallocating the structure that you're writing to?

Sign in to comment.


Jeff
Jeff on 10 Mar 2014
Hi, Sean,
Yes, I'm preallocating using the following code:
tempData = cell(numFiles,5);
Where numFiles is the number of dicom files in the folder. Is speed linear? In other words does 60/1.3 = 10000/x?

  2 Comments

Sean de Wolski
Sean de Wolski on 10 Mar 2014
d = dir('*.dcm');
nfiles = numel(d); % 60
niter = 100;
c = cell(niter*60,1);
tic;
for kk = 1:niter
for ii = 1:nfiles
c(ii) = {dicominfo(d(ii).name)};
end
end
6000 iterations in:
Elapsed time is 56.090968 seconds.
Sean de Wolski
Sean de Wolski on 10 Mar 2014
Are your files local or on a network?

Sign in to comment.


Jeff
Jeff on 11 Mar 2014
Thanks, everyone for your help. What I have seen is that once the folder gets above a certain number of files (not really sure what the magic number is), it seems the loop speed slows considerably. So it makes me wonder if there is something happening with a large number of files that causes it to slow.

  2 Comments

Sean de Wolski
Sean de Wolski on 11 Mar 2014
Do you run out of RAM?
How big is the array that you're storing this information in?
Cells and structs have some overhead so perhaps you're hitting a memory limitation that causes MATLAB to require using swap space, which will be much slower.
Run your loop for a smaller number of iterations and report back with the output from:
whos
dpb
dpb on 11 Mar 2014
...What I have seen is that once the folder gets above a certain number of files...
Which Windows version? Not sure if the Change Notification Handle thingie still hangs around with later versions or not--I've not upgraded so no experience.
Also, check whether your antivirus software is interfering.
MS has moved MSDN around so much I can't find anything any more, but at least in years gone by there definitely were issues with large subdirectories--how well they've been solved/resolved in later versions I don't know.

Sign in to comment.


Jeff
Jeff on 11 Mar 2014
@Sean: The array can get as large as 55,000 x 5 at the moment. I say at the moment because I'm currently looking at a days worth of CT scans. I may change my code to pull images every hour or two just to make the process more manageable. Your comment regarding swap space may be what's going on, although I'm not that familiar with that subject - guess I'll need to research that a bit.
@dpb: I'm on Windows 7 with 16 GB of RAM.

  2 Comments

dpb
dpb on 12 Mar 2014
What did the profiler show?
You've still not even shown an outline of what actually are doing for 2nd-set-of-eyes sanity check, what more the segment of actual code.
Is it all local storage or is it network access?
Did you check on the AV software interference issue?
In last resort, what about "embarrasingly parallel" offloading to multiple processors?
I still have to think there's something unexplained going on that's a factor.
Jeff
Jeff on 17 Mar 2014
Hi, dpb. I am away from that computer for a few more days and will be able to answer your questions after I return.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!