Removing Rows From a Matrix by Label Quickly

I currently have some code that is destined to: import data as a matrix from a text file, read the data and delete any rows in the matrix that do not start with "H", delete the H label from the remaining rows, print this matrix to a text file. My code is as follows:
FID = fopen('Test_Data_2.txt','rt');
m=[];
while ~feof(FID)
l=fgetl(FID);
if~isempty(l)
if l(1:1)=='H'
m = [m;str2num(l(2:end))];
end
end
end
FID = fclose(FID);
dlmwrite('md_msd.out', m, 'delimiter', '\t', ...
'precision', 6)
This code works great for small data sets but I am going to have to use it for sets of 100000 rows or more. I need a way to speed up the process as my current code takes far too long. Is there any way I can make this code faster?

5 Comments

Can you post a few lines of your txt file, along with your desired output? In particular, it is not really clear which lines you wish to reject.
Joseph
Joseph on 26 Jun 2013
Edited: Joseph on 26 Jun 2013
Here is the first two data sets of the text file. There are far more but this should give you an idea. I need to remove any row that doesn't start with and H and then after that is finished remove the H itself from the data.
46
Graphene-Cs
C 3.069663 1.451149 9.855179
C 1.828334 2.119961 9.393412
C 1.786922 3.540219 9.865837
O 4.372544 3.743185 11.433878
O -0.584268 3.630468 11.355020
O -1.930575 5.780541 11.291985
O 3.077855 5.897470 11.394889
H 1.794613 6.206863 8.322277
H -0.502045 6.308341 8.184642
H 1.736684 2.133682 8.239135
H 4.397250 2.102109 8.305959
46
Graphene-Cs
C 3.070549 1.453499 9.857827
C 1.829730 2.119381 9.396008
C 1.786074 3.539174 9.865825
O 1.970183 3.587105 11.297875
O 4.373375 3.742606 11.432755
O -0.585405 3.630461 11.357721
O -1.929905 5.780640 11.291806
H 1.733321 2.144360 8.248811
H 4.400987 2.107167 8.310423
H -0.865142 2.313868 8.282938
Walter's suggestion of sed and perl are excellent ones, but for your reference, a pure Matlab approach could involve:
%read full file into memory
% if insufficient memory available, you might need to read into
% batches of bytes at a time, using fread() or similar.
str=fileread('Test_Data_2.txt');
%extract lines starting with 'H'
out = regexp(str, '^\s*H([^\n]+)$', 'match', 'lineAnchors');
%extract decimal numbers from that
m = textscan(cell2mat(out), 'H %f %f %f', 'CollectOutput', true);
m = m{1}; %this is your data matrix
%write it to a text file
dlmwrite('md_msd.out', m, 'delimiter', '\t', ...
'precision', 6);
cell2mat() could fail there because the lines might not be exactly the same length. You could cellstr() to ensure the shorter lines are padded with blanks.
However, textscan() cannot accept arrays of char as the first argument (or cell array of char either.) You would have to reconstruct as a single string with \n between the lines in order to use textscan in this manner.
textscan(sprintf('%s\n', out{:}), .....)
Thanks for the help! I finally got the function to run in around 12 seconds for one of the larger data sets. Thank you!

Sign in to comment.

Answers (1)

Use other tools when it makes sense to do so. For example, in Linux or OS-X from their shells:
sed -e '/^$|^[^H]/d', -e 's/^H//p' < Test_Data_2.txt > md_msd.out
Or perl
perl -e '/^H/ && s/^H// && print' < Test_Data_2.txt > md_msd.out
You can invoke perl from within MATLAB using the perl() command.

3 Comments

Could you clarify a bit, I am rather new to programming. Should perl be installed on my computer by default or should I look for some software to run it. Sorry if that is a trivial question.
perl is part of the MATLAB distribution on MS Windows, so there is no need to install it. On OS-X and Linux, I believe those are defined to include perl as part of the operating system, so there is no need to install perl on those systems either.
OK, I understand. So to call perl to help execute the code above would I simply place:
perl -e '/^H/ && s/^H// && print' < Test_Data_2.txt > md_msd.out
in my MATLAB function? Or do I need to wire perl script that preforms the same operations as the above code?

Sign in to comment.

Categories

Products

Asked:

on 25 Jun 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!