Read values from a very complex txt file into matlab

1 view (last 30 days)
Dear Matlab community,
I have a data file of wave heights that looks like this:
Euro platform;Golfhoogte, significante-, uit energiespectrum van 30-500 mHz in cm;1982-11-19;19:00;;361;cm;NVT;Tijdreeks en frequentie analyse, methode CIC/MAREG;Nationaal;Stappenbaak - type Marine 300;NVT;NVT;4230;51.9986111;3.2763889;NVT;NVT,NVT,Niet van toepassing
Euro platform;Golfhoogte, significante-, uit energiespectrum van 30-500 mHz in cm;1982-11-19;22:00;;363;cm;NVT;Tijdreeks en frequentie analyse, methode CIC/MAREG;Nationaal;Stappenbaak - type Marine 300;NVT;NVT;4230;51.9986111;3.2763889;NVT;NVT,NVT,Niet van toepassing
Euro platform;Golfhoogte, significante-, uit energiespectrum van 30-500 mHz in cm;1982-11-20;01:00;;379;cm;NVT;Tijdreeks en frequentie analyse, methode CIC/MAREG;Nationaal;Stappenbaak - type Marine 300;NVT;NVT;4230;51.9986111;3.2763889;NVT;NVT,NVT,Niet van toepassing
Euro platform;Golfhoogte, significante-, uit energiespectrum van 30-500 mHz in cm;1982-11-20;04:00;;381;cm;NVT;Tijdreeks en frequentie analyse, methode CIC/MAREG;Nationaal;Stappenbaak - type Marine 300;NVT;NVT;4230;51.9986111;3.2763889;NVT;NVT,NVT,Niet van toepassing
I am only interested in reading the date (column 3) the hour (column 4) and the wave height, (column 6). The file has 18 columns and is Semicolon(;) separated, so the commas in between the strings can be ignored. There is an empty column among the hour and wave height, that's why wave height is in column 6 and not 5. The length of the file is very big, something like (28years * 365days * 24 h*60 min) in length.
I am using the command:
[data(:,1),data(:,2)...data(:,18)] = textread('wave.txt','%q %q %q %q %q %q %q %q %q %q %q %q %q %q %q %q %q %q','delimiter',';');
This method works but is very very slow, and it gave me some problems with 'buffersize' memory sometimes. Do you guys know a better way to do this? Maybe read only the date and wave heights and dump all the crappy text?

Accepted Answer

Laura Proctor
Laura Proctor on 6 Apr 2011
You can ignore data in textread using the asterisk (*) after the percentage symbol when reading in data.
For example, if you have 4 columns and only wish to read in the first and third columns of data:
data = textread('myFile.txt','%s %*f %f %*f','delimiter',';');
Also, you may wish to look at the documentation for TEXTREAD to see if you can read in the data using better formats other than %q which is used to read in a double quoted string. It seems that you have some numeric data that would be well suited for %f.

More Answers (1)

Pedro Cavaco
Pedro Cavaco on 6 Apr 2011
Thanks Laura for the hint... Simple and efficient.
The %q was just a desperate way I found to allow matlab to read the big strings into a position in the array.
I tried the * and is much faster now.
Greetings

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!