Read csv-File fast / convert csv to mat
    8 views (last 30 days)
  
       Show older comments
    
Hello everybody,
From a datalogger i have a csv-file which looks like this:
"s";"m";"";"";"deg";"";"m/s≤";"m/s≤";"m/s≤";"bar";"bar";"";"";"";"";"%";"1/s";"";"km/h";"";"V";"bar";"bar";"∞C";"∞C";"";"∞";"∞";"m";"∞";"Km/h";"km/h";"m/s≤";"deg/s";"m/s≤";"∞";"sec";"m"
   "0,000";"0";"2,00";"10,48";"193,01";"11,08";"0,99";"0,21";"5,49";"0,85";"0,86";"50,00";"54,00";"53,00";"71,00";"5,5";"0,00";"1,0";"0,00";"0";"12,46";"0,00";"3,85";"-40,0";"-40,0";"0,00";"9,272901";"48,995935";"800,40";"1,00";"0,07";"0,1";"-0,01";"0,00";"0,00";"0,00";"0,00";"0,00"
What i've done yet is i created a little GUI tool to plot the data in different graphs, that works fine. To read the data, i've used the import tool from matlab itself, but the csv file is more than 200MB big, to import this takes realy long. And the csv will is becoming bigger in the next version because there will be more channels added.
So my question is the following: first, i want to load the csv-file faster and second it would be nice if the script imports automatically all columms, so it doesn't mater if the csv has 3 or 30 columms.
It would be nice if someone could help me with this problem :)
1 Comment
  dpb
      
      
 on 9 Apr 2014
				Well, you can try importdata and see how it goes for speed but probably best you'll do will be with textread or textscan
The other delimited-file routines such as csvread aren't able to deal with the header row. If you could do without it, keeping it separately it might help.
Of course, if speed and file size is the issue, not writing a text file to begin with would be the way around the problem. If at all possible, use a stream file instead.
If you must stay formatted, I'd seriously recommend ditching the embedded headerline for a separate info file and then I'm very partial to the (deprecated) textread for the job as it returns a "regular" array instead of cell array by default.
Answers (6)
  per isakson
      
      
 on 9 Apr 2014
        
      Edited: per isakson
      
      
 on 10 Apr 2014
  
      Warning: These functions change your data files. Operate on a copies!
Here are two functions to test regarding speed. I believe the first function will do the job in less time. This approach with a separate step, c2p, works well when there is enough memory for the entire file to fit in the system cache.
Both functions return
    >> whos M
      Name      Size            Bytes  Class     Attributes
      M         3x38              912  double
Run
    M = cssm_q( 'cssm.txt' );
where (in one m-file)
    function  num = cssm_q( filespec )
        c2p( filespec );
        fid = fopen( filespec, 'r' );
        str = fgetl( fid );
        ncl = length( strfind( str, ';' ) ) + 1; 
        frm = repmat( '%f', [1,ncl] );
        cac = textscan( fid, frm, 'Delimiter',';', 'CollectOutput',true );
        num = cac{1};
        fclose( fid );
    end
    function c2p( filespec )
        file    = memmapfile( filespec, 'writable', true );
        comma   = uint8(',');
        point   = uint8('.');
        file.Data(( file.Data==comma)' ) = point;
        quote   = uint8('"');
        space   = uint8(' ');
        file.Data(( file.Data==quote)' ) = space;
    end
and where cssm.txt contains your header row together with a few copies of your data row.
.
Second function is
    function  num = cssm( filespec )
        c2p( filespec );
        fid = fopen( filespec, 'r' );
        str = fgetl( fid );
        ncl = length( strfind( str, ';' ) ) + 1; 
        frm = repmat( '%q', [1,ncl] );
        cac = textscan( fid, frm, 'Delimiter',';' );
        num = nan( length(cac{1}), length(cac) );
        for jj = 1 : length(cac)
            num( :, jj ) = str2num( char( cac{jj} ) ); 
        end
    end
    function c2p( filespec )
        file    = memmapfile( filespec, 'writable', true );
        comma   = uint8(',');
        point   = uint8('.');
        file.Data(( file.Data==comma)' ) = point;
    end
This is a few columns of the data file after running cssm_q
    s ; m ;  ;  ; deg ;  ; m/s? ; m/s? ; m/s? ; bar ; 
    0.000 ; 0 ; 2.00 ; 10.48 ; 193.01 ; 11.08 ; 0.99 ;
    0.000 ; 0 ; 2.00 ; 10.48 ; 193.01 ; 11.08 ; 0.99 ;
    0.000 ; 0 ; 2.00 ; 10.48 ; 193.01 ; 11.08 ; 0.99 ;
The "?" in the header shows that I have made a mistake regarding the text encoding.
.
"but i can also export the data without the double quotes, so the data is just seperated by commas"
And I assume "." as decimal separator. That's more Matlab-friendly. Try dlmread, which actually calls textscan, and cssm_c, which I believe to be somewhat faster. To be fair dlmread has better error handling.
    function  num = cssm_c( filespec )
        fid = fopen( filespec, 'r' );
        str = fgetl( fid );
        ncl = length( strfind( str, ',' ) ) + 1; 
        frm = repmat( '%f', [1,ncl] );
        cac = textscan( fid, frm, 'Delimiter',',', 'CollectOutput',true );
        num = cac{1};
        fclose( fid );
    end
3 Comments
  Joseph Cheng
      
 on 10 Apr 2014
				Oopse, it's late in the day and thought i should stop playing and get back to work. Didn't notice i just kind of left it vague. The test was for cssm_q. cssm took longer than my trip to and from the vending machine so... i killed the process.
  Image Analyst
      
      
 on 9 Apr 2014
        
      Edited: Image Analyst
      
      
 on 9 Apr 2014
  
      Doesn't look like a csv file to me, which should be numbers separated by commas. You have double quotes and semicolons in between the commas in addition to the numbers. It is supposed to be only numbers. You might be able to use dlmread, if your delimeters are actually semicolons and not commas. You might also try readtable(), which is what you use for reading in tables that have mixed data types such as character strings along with numbers on the same row.
1 Comment
  per isakson
      
      
 on 9 Apr 2014
				
      Edited: per isakson
      
      
 on 9 Apr 2014
  
			In some parts of the world "," is used as decimal delimiter and ";" as list delimiter. Despite these delimiter characters we often call the file "csv". It certainly causes some confusion and extra work.
  David
 on 9 Apr 2014
        2 Comments
  Image Analyst
      
      
 on 9 Apr 2014
				What about the semicolons? It almost looks like some European way of doing it where everything is different, like decimals are commas, and commas are semicolons.
  David
 on 10 Apr 2014
        1 Comment
  per isakson
      
      
 on 10 Apr 2014
				
      Edited: per isakson
      
      
 on 10 Apr 2014
  
			One option is to replace
    str = fgetl( fid );
by
    st1 = fgetl( fid );
    st2 = fgetl( fid );
If you do not need to keep the header line you may use the option, Headerlines, of textscan, i.e
    cac = textscan( fid, ....., 'Headerlines', 2 );
Mistake: I forgot that the header line is needed to find the number of columns
    ncl = length( strfind( str, ';' ) ) + 1; 
    frm = repmat( '%f', [1,ncl] );
However, those two solutions will break when you get a file with a different number of header lines. A more flexible solution will include something like
    str = 'dummy'; n = -1;
    while isempty( regexp( str, '^[0-9\+\-\.\,;" ]{100,}$', 'match' ) )
        str = fgetl( fid );
        n = n + 1;
    end
    ....
    frewind( fid )
    cac = textscan( fid, frm, ....., 'Headerlines', n );
not tested
  David
 on 11 Apr 2014
        3 Comments
  per isakson
      
      
 on 13 Apr 2014
				I never use GUIDE and I'm not able to quickly find an answer. Please post this comment as a new question with "GUIDE" as one of the tags.
  Dahai Xue
      
 on 28 Jan 2015
        for system with Excel installed, this simply works
[~,~,myTable] = xlsread('myFile.csv');
0 Comments
See Also
Categories
				Find more on Text Data Preparation in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




