MATLAB Answers

Finding Lines in a Large Text File with a Specific Text

4 views (last 30 days)
Sonoma Rich
Sonoma Rich on 12 Jul 2019
Answered: Sonoma Rich on 13 Jul 2019
I am trying to read a large text file (>1GB). I only want to read lines that contain a specific text. For example, I want to read every line that contains "<field name="data". Currently I am using fgetl and reading every line, checking if the text is in the line, but it takes too long. Any suggestions?

  0 Comments

Sign in to comment.

Accepted Answer

Sonoma Rich
Sonoma Rich on 13 Jul 2019
I found the following code that works well
filetext = fileread('fileread.m');
expr = '[^\n]*fileread[^\n]*';
matches = regexp(filetext,expr,'match');
disp(matches')
but the regexp function is slower than I expected. I ended up using the following method which is significantly faster.
fid = fopen('fileread.m','r');
ftext = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
matches = ftext{1}(contains(ftext{1},'fileread'));
disp(matches)

  0 Comments

Sign in to comment.

More Answers (2)

KSSV
KSSV on 12 Jul 2019
Read about textscan. This function gives you option of running a loop and reading required chunks (lines) of the file. In these chunks, you can pick your required line.

  0 Comments

Sign in to comment.


Walter Roberson
Walter Roberson on 12 Jul 2019
If you have enough memory:
S = fileread('YourFileNameHere.txt');
selected = regexp('^.*<fieldname\s*=.*$', 'match', 'dotexceptnewline', 'lineanchors');
And in the case where you do not care what is at the begining or end of line and just want to know what the "data" field content is, then
S = fileread('YourFileNameHere.txt');
datas = regexp('(?<=fieldname\s*=")(?<data>[^"]*)', 'tokens');
That should get you a struct array with field name 'data' that is the content of inside the quotes.

  0 Comments

Sign in to comment.

Sign in to answer this question.