How can I import only the numbers from an csv.-files with a text header?

8 views (last 30 days)
I have hundreds of .csv-files, I attached one of them for example (Had to shorten it, beacuse it was bigger than 5 MB). Each of them has 10^6 Lines with data.
And I want to import those files automatically in my Matlab code. It is totally enough to import them one by one, but unfortunately I always had to preprocess this data manually with Text Editor. The problem is the text in the header of every .csv-file. I just want to import the numbers of the second, third and fourth column and not the text from the header. But even if I specify the columns, I cannot convert the recieved data store in numbers to run the calculations. This is my solution with the preprocessed data:
pre_data = datastore('Data.csv');
piece = zeros(1,3);
while hasdata(pre_data)
pie = read(pre_data);
pie = pie(:,1:3);
pie = table2array(pie);
piece = [piece; pie];
end
piece = piece(9:10^6+8,:);
With "piece", I can now easily run the calculations
To import the data without preprocessing, I tried "ds.SelectedVariableNames" and replacing "datastore" with "csvread". But nothing works.
Have anyone an advice, how to import such csv-files as an easily processable 1000000x3-double?
  1 Comment
dpb
dpb on 15 Dec 2018
Edited: dpb on 15 Dec 2018
Just attach the text of the first few (10 is enough) lines of the file that shows the header and data structure; how many data lines are in the file after the header is totally immaterial to the solution (as long as you have enough memory to hold the data).
The key Q? is whether the file structure is the same regarding the header -- is it always the same number of lines, are there a consistent number of blank records (if any) after the header, etc., etc., etc., ...
Also, are there the same number of variables (columns) in the file and are the records properly delimited if there are missing data?

Sign in to comment.

Accepted Answer

Jeremy Hughes
Jeremy Hughes on 16 Dec 2018
You should be able to add 'NumHeaderLines',7 to the datastore call and get what you want.
The issue is that this looks a lot like a CSV file exported from Excel. There are a lot of extraneous commas, and that's throwing off all the file format detection.
  1 Comment
Christoph Müßig
Christoph Müßig on 16 Dec 2018
Thank you all for your ideas and tricks. The solution to add 'NumHeaderLines',7 to the datastore call worked perfectly and solved the problem.

Sign in to comment.

More Answers (0)

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!