MATLAB Answers

0

Finding Lines in a Large Text File with a Specific Text

Asked by Sonoma Rich on 12 Jul 2019
Latest activity Answered by Sonoma Rich on 13 Jul 2019
I am trying to read a large text file (>1GB). I only want to read lines that contain a specific text. For example, I want to read every line that contains "<field name="data". Currently I am using fgetl and reading every line, checking if the text is in the line, but it takes too long. Any suggestions?

  0 Comments

Sign in to comment.

3 Answers

Answer by Sonoma Rich on 13 Jul 2019
 Accepted Answer

I found the following code that works well
filetext = fileread('fileread.m');
expr = '[^\n]*fileread[^\n]*';
matches = regexp(filetext,expr,'match');
disp(matches')
but the regexp function is slower than I expected. I ended up using the following method which is significantly faster.
fid = fopen('fileread.m','r');
ftext = textscan(fid,'%s','Delimiter','\n');
fclose(fid);
matches = ftext{1}(contains(ftext{1},'fileread'));
disp(matches)

  0 Comments

Sign in to comment.


Answer by KSSV
on 12 Jul 2019

Read about textscan. This function gives you option of running a loop and reading required chunks (lines) of the file. In these chunks, you can pick your required line.

  0 Comments

Sign in to comment.


Answer by Walter Roberson
on 12 Jul 2019

If you have enough memory:
S = fileread('YourFileNameHere.txt');
selected = regexp('^.*<fieldname\s*=.*$', 'match', 'dotexceptnewline', 'lineanchors');
And in the case where you do not care what is at the begining or end of line and just want to know what the "data" field content is, then
S = fileread('YourFileNameHere.txt');
datas = regexp('(?<=fieldname\s*=")(?<data>[^"]*)', 'tokens');
That should get you a struct array with field name 'data' that is the content of inside the quotes.

  0 Comments

Sign in to comment.