Filter data from text file whilst importing

10 views (last 30 days)
jlt199
jlt199 on 10 Nov 2016
Answered: dpb on 10 Nov 2016
Hi, I have a large csv file, around 1.5 million rows, that I would like to filter whilst importing. I'm not sure I have enough memory to import the complete file as a table and then filter once in Matlab, and since this will be running unaccompanied over the long weekend I want something that will definitely work. I currently have the following code that imports everything. I want to only import rows where the 9th variable is equal to 1. Can I adapt this code to do that?
readfrom = 'test.csv'
fileID = fopen(readfrom);
keep = textscan(fileID,'%s%d%f%d%f%f%d%d%d%d%f%f%f%f%f%f%f%f%f\r', ...
'HeaderLines',1,'delimiter',',');
fclose(fileID);
Many thanks

Answers (1)

dpb
dpb on 10 Nov 2016
Only way I see to do this is to either read line-by-line and parse each line to see whether to keep or not or on same idea to read in blocks of (say) 10000 lines or so and do the selection block-by-block.
Not knowing anything of the file format itself, it's possible one could preprocess it before reading with Matlab using a batch editor or regular expressions or the like. Of course, depending on just how big the file is (having 1M lines isn't necessarily out of hand depending on the length of each record), you might be able to read as character/cellstr array and do the cleanup in memory as character before conversion, too....too many possibilities depending on unknown details to say unequivocally what would be best approach. But, the first two will work, just time-consuming.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!