Reading a complex text file and building a matrix

Hello MATLAB experts,
I am stuck at a typical problem and would appreciate your help a lot. I am trying to read a complex file(attached - - example.txt). This file has millions of lines I truncated it to only 2000.
My aim is simple:
  • If a column is '--- detector1 ---'.
  • increament 'numberofgammaclusters'.
  • Then X = first numerical digit, Y = second numerical digit, and A(X,Y) = third numerical digit.
  • read this till next '--- detector1 ---' is encountered. On the this encounter repeat the steps 2 and 3 are repeated.
The sample code that I am trying is below. Please let me know. Any help regarding the improvements in the code or any advice in the approach is hugely appreciated.
A= zeros(256, 256);
E = importdata('gamma.txt', ' ');
numberofgammaclusters=0;
for i=1:1082952
if E.textdata(:,2)==contains('detector1')
numberofgammaclusters=numberofgammaclusters+1;
A()= % The values at second last column % Part of the code I don know how to write
end
end
Thanks very much in advance.
Regards,
Sanchit Sharma

3 Comments

There is a number, numberofgammaclusters, of blocks like
--- detector1 ---
PixelHit 153, 88, 2158.3, 0
PixelHit 153, 89, 3490.69, 0
PixelHit 154, 88, 687.456, 0
PixelHit 154, 89, 2675.81, 0
PixelHit 155, 89, 3452.2, 0
PixelHit 156, 90, 3139.74, 0
PixelHit 156, 91, 2414.16, 0
in the file.
Do you want to calculate one A(X,Y) for each block or one A(X,Y) for the entire file? Or am I missing something?
Hello thanks very much for your response.
What you posted is one cluster, here 153...156 is X coordinate and 88...91 is Y coordinate third column are the values i.e. A(X,Y), I want to calculate these for the whole file. i.e. I want to build a matrix A of 256 X 256 dimensions and put these values here.
The actual file has millions of lines very similar to what I posted.
Thanks very much!
In short I need A(X,Y) for entire file. Please let me know If I am not clear. I appreciate your time.
Thanks!

Sign in to comment.

 Accepted Answer

Try this
%%
chr = fileread( 'example.txt' );
clusters = strsplit( chr, '--- detector1 ---\r\n' );
clusters(1) = [];
clear('chr');
numberofgammaclusters = length( clusters );
A = nan( 256, 256 );
for jj = 1 : numberofgammaclusters
cac = strsplit( clusters{jj},'\r\n' );
for ii = 1 : length( cac )
if not( contains( cac{ii}, '===' ) )
vec = textscan( cac{ii}, 'PixelHit%f%f%f%f', 'Delimiter',',' );
A(vec{1},vec{2}) = vec{3};
else
break
end
end
end
This script requires some memory, but I think it will be ok.
Second thought. Replace
vec = textscan( cac{ii}, 'PixelHit%f%f%f%f', 'Delimiter',',' );
A(vec{1},vec{2}) = vec{3};
by
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
A(vec(1),vec(2)) = vec(3);
to avoid vec being a cell array
In response to comments
Here is a script that is somewhat more robust. Matlab's indexing is one-based. In your file X and maybe Y takes the value zero. I added "+1".
%%
chr = fileread( 'gamma.txt' );
clusters = regexp( chr, '--- detector1 ---[ ]*\r*\n', 'split' );
clusters(1) = [];
clear('chr');
numberofgammaclusters = length( clusters );
A = zeros( 256, 256 );
for jj = 1 : numberofgammaclusters
cac = regexp( clusters{jj},'\r*\n','split' );
for ii = 1 : length( cac )
if not( contains( cac{ii}, '===' ) )
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
A(vec(1)+1,vec(2)+1) = A(vec(1)+1,vec(2)+1) + vec(3);
else
break
end
end
end
imagesc( A );
% pick a colormap and show "zero" (approx. A(X,Y)<1) as white
mymap = colormap( parula(1e5) );
mymap(1,:)=1;
colormap( mymap )
colorbar
% flip the YAxis
ax = gca;
ax.YAxis.Direction = 'normal';
outputs
Capture.PNG

14 Comments

I ran this code and it gave me a matrix A(256X 256) = full of NaN characters, anyways I am very grateful for your time. Let me know if there is an update for my issue.
Thus
A(vec{1},vec{2}) = vec{3};
is never executed, which indicates that one or more of my assumptions about your text file is wrong.
  • Does '--- detector1 ---' start each cluster?
  • Are \r\n used to indicate line break?
Your sample file produced some values in A. imagesc(A) returned
Capture.PNG
This is exactly what I need. But when I run this on my main file It gives me nothing. I am attaching my main file in zip here as its too big.
The output looks perfect.
Thanks!
This screen clip shows the beginning of gamma.txtCapture.PNG
Note the single character line break.
I added a modified script to my answer. With gamma.txt it produced
Capture2.PNG
Hello,
Thanks for this. Just a minor thing. As the code is simply replacing the values in the matrix, I needed to add the values if there is already a value present, Hence I made some modifications that are highlighted below.
The output this gives(attached: MAtlab.png) is slightly different from the required output( attached: Required.png). As in the middle we do not see the yellow area (gaussian effect). Also In MATLAB the Y axis is not starting from zero.
Also, What text editor did you use to identify the line break.
I am a beginner in MATLAB apologies if my question is too basic, please let me know how to rectify this.
Best Regards,
Sanchit Sharma
chr = fileread( 'gamma.txt' );
clusters = regexp( chr, '--- detector1 ---[ ]*\r*\n', 'split' );
clusters(1) = [];
clear('chr');
numberofgammaclusters = length( clusters );
A = zeros( 256, 256 ); % MADE ZERO
B = zeros( 256, 256 ); % MADE ZERO
for jj = 1 : numberofgammaclusters
cac = regexp( clusters{jj},'\r*\n','split' );
for ii = 1 : length(cac)
if not( contains( cac{ii}, '===' ) )
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
B(vec(1)+1,vec(2)+1) = vec(3);
A=B+A; % ADDED
else
break
end
end
end
%%
imagesc(A)
colormap jet;
I use the free editor, notepad++, for almost everything but Matlab code.
I modified my answer, but it is only close to your requirements.
Use Debug a MATLAB Program and Examine Values While Debugging to figure out the effect of
B(vec(1)+1,vec(2)+1) = vec(3);
A=B+A;
It looks weird to me. What is the final value of B ?
Hello, thanks a lot for your response.
I have implemented the program so that It dumps five clusters in each text file. Total we have 13025 clusters, hence 2065 text files. Currently, it successfully dumps the data. But it is dumping wrongly. i.e. i believe it is sequentially adding and then dumping the data.
I would be grateful if you can please take a look at the below code and suggest me something with your expertise.
Many Thanks!
%%
chr = fileread( 'gamma.txt' );
clusters = regexp( chr, '--- detector1 ---*\r*\n', 'split' );
clusters(1) = [];
clear('chr');
numberofgammaclusters = length( clusters );
A = zeros( 256, 256 );
B = zeros( 256, 256 );
for jj = 1 : 5
cac = regexp( clusters{jj},'\r*\n','split' );
for ii = 1 : length(cac)
if not( contains( cac{ii}, '===' ) )
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
B(vec(1)+1,vec(2)+1) = vec(3);
A=B+A;
else
break;
end
end
fid = fopen(sprintf('C:\\Users\\sanchitsharma\\Desktop\\MATLABscripts\\Gamma_Frames\\GFrame1.txt'),'wt');
for ii = 1:256
fprintf(fid,'%g\t',B(ii,:));
fprintf(fid,'\n');
end
fclose(fid);
end
%%
for n = 1:2604
for jj = 1+(5*n) : (5+5*n)
cac = regexp( clusters{jj},'\r*\n','split' );
for ii = 1 : length(cac)
if not( contains( cac{ii}, '===' ) )
vec = sscanf( cac{ii}, 'PixelHit%f,%f,%f,%f' );
B(vec(1)+1,vec(2)+1) = vec(3);
A=B+A;
else
break;
end
end
end
fid = fopen(sprintf('C:\\Users\\sanchitsharma\\Desktop\\MATLABscripts\\Gamma_Frames\\GFrame%d.txt',(n+1)),'wt');
for ii = 1:256
fprintf(fid,'%g\t',B(ii,:));
fprintf(fid,'\n');
end
fclose(fid);
end
%%
imagesc(A)
colormap jet;
colorbar;
Hello,
You are right B looks wierd to me too. It is perfect. Now just the dumping of cluster is peculiar. I wanna dump 5 clusters in each text file but the number of clusters/file increase as the number of file increases.
Thanks!
"It is perfect." What's perfect?
"5 clusters in each text file" What exactly is "5 clusters" ? I'm not a fan of deducing the intent from code that doesn't produce the correct result.
Keeping track of "2065 text files" sounds like a nightmare to me.
The new code with the update A(vec(1)+1,vec(2)+1) = A(vec(1)+1,vec(2)+1) + vec(3); works perfectly. I am writing a cluster analysis code to analyze each cluster if I put many clusters in one text file there are chances they may overlap. Anywho I have figuresed out a way to do this.
Many thanks for your help!
Why did you replace [ ]* by * ?
Capture.PNG
Apologies! I did not understand the meaning of *[ ]. Can you please let me know what does that *[ ] mean? What does it do?
[ ]* stands for zero or more spaces. It's easy to miss trailing spaces, since they don't show in the editor.

Sign in to comment.

More Answers (1)

Bob Thompson
Bob Thompson on 30 Jul 2019
Edited: Bob Thompson on 30 Jul 2019
I have not been able to utilize your example file, it's a limitation on my end.
That being said, this is how I would look at doing what I understand you're looking for.
fid = fopen('gamma.txt');
line = fgetl(fid);
c = 1;
while isnumeric(line)
if length(line) > 8 & strcmp(line(1:8),'PixelHit')
tmp = regexp(line,' ','split');
A(str2num(tmp{2}),str2num(tmp{3})) = str2num(tmp{4})
end
line = fgetl(fid);
c = c+1;
end
Might need to do some minor editing, because I couldn't use your example file, but the basic concept is sound. If you're looking to capture other data, just add an elseif condition.
This will take some time, but any method (as far as I know) for reading a 2mil line text file is going to take some time.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!