What is the best way to read data from multiple csv files into one variable?

11 views (last 30 days)
I use Matlab to analyse data from radar stations. The radar stations store 24 hours of data in one comma separated value file (each day has a new file with a new name). I am studying variances over multiple days (up to a month) at a time and trying to find the most efficient and comprehensive method of reading the data in the files into variables (since we have a lot of data and others may be using this code long after I am gone). I am using Matlab 2010a.
I have come up with two methods. Using tic and toc, they are anywhere from equal times to one being twice the speed of the other. They are given below. Any input into these methods or another method you think may be more efficient or comprehensive would be greatly appreciated.
Here is the sometimes faster code:
[PSD, f]=spectra(varargin)
%{
[PSD, f]=SPECTRA(varargin) returns the power spectral density PSD at frequencies f of the wind velocities located in the files specified in varargin.
%}
% Read in the raw data:
raw=dlmread(varagin(1), ',', 0, 1);
time=raw(:, 1:6);
vel=raw(:, 8);
% If there is more than one input file, read them in too:
if nargin>1
for nfile=2:nargin
raw=dlmread(varargin(nfile), ',', 0, 1);
time=[time, raw(:, 1:6)];
vel=[vel, raw(:, 8)];
end
end
In the second, sometimes slower, code I "initialize" the variables time and vel prior to reading in data:
[PSD, f]=spectra(varargin)
%{
[PSD, f]=SPECTRA(varargin) returns the power spectral density PSD at frequencies f of the wind velocities located in the files specified in varargin.
%}
% Initialize the variables:
time=[];
vel=[];
% Read in all data:
for nfile=1:nargin
raw=dlmread(varargin(nfile), ',', 0, 1);
time=[time, raw(:, 1:6)];
vel=[vel, raw(:, 8)];
end
  2 Comments
Shane
Shane on 24 Sep 2012
Out of curiosity, is there a reason you have chosen to use dlmread as opposed to csvread? Also, I have made a similar type of function in that I have given coworkers the ability to read in multiple files and compile the required data into a variable (though not from csv files) and I am curious if there is a need for them to specify all of the files that they need? For instance, I have my function set up to automatically determine all of the files in a directory folder of a certain extension (.csv in your case) and read them in automatically without forcing the user to input the filenames manually. Would that be a good function for your situation?
Mel
Mel on 24 Sep 2012
When I use csvread, Matlab says something to the effect of "csvread won't be available in later versions, use dlmread, with the delimiter specified as a comma, instead".
The radar data is used for research. For my particular purposes, I may want to only read in 2-3 files or all of the files in a folder. The files are all named in the following convention:
"yyyymmdd.place.x.xD"
where "yyyymmdd" is the numeric date, "place" is the name of the closest town to the radar site, x.x is the height range of the data in the file (typically 1.0 to 14.0), and D is the beam direction ("N" for north, "E" for east, etc.). It would not be beneficial to read in all files at once (as you cannot analyse two heights or directions together), but it would be beneficial if the user could specify the date range, location, altitude, and beam direction they were interested in rather than list all of the files they wanted to include.

Sign in to comment.

Answers (1)

Robert Cumming
Robert Cumming on 24 Sep 2012
Is the number of rows in each file fixed?
If so you can properly/accurately pre-allocate the variables time and vel - proper accurate pre-allocation will result in the fastest loading time.
  2 Comments
Mel
Mel on 24 Sep 2012
No, the number of rows in each file is not fixed. After reading in "raw" I could find the dimensions of that and properly pre-allocate the variables time and vel, however, if I was reading in multiple files, this method would only work for the first file.
Robert Cumming
Robert Cumming on 24 Sep 2012
you should still avoid the uncontrolled growing that your currently doing - thats what is slowing down your read routine. Either:
1. Preallocate to bigger than you think you will require - convert to NaN - then populate from file and finally reduce by removing the NaN.
2. read each into individual cell array database - then join/combine.

Sign in to comment.

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!