Reading and finding string in a text (.TXT) file containing string and numerical data
35 views (last 30 days)
Show older comments
I am trying to read data from a text file (attachment). This is a typical summary output from a CFD simulation which has a short description in the first couple of lines, then has columns of numerical data after that. I have always handled CFD ouput using excel but it seems like working with .TXT files as output is easier and the program that I use for CFD computations generates output in .TXT fromat. However, one issue that I constantly face when dealing with .TXT files is that it is not clear to me in which format the data is read into the machine, how the existing commands hanlde the data, or even which command will be the most suitable one for the data format at hand. The good thing is that the data files will always look like this, so if I knew which command to use, I wouldn't need to switich to another one.
What I need to know is this:
- Which command to use, and I have tried all of these: readtable, readmatrix, textread & textscan but each one has a slight quirk that makes it unsuitable for the kind of search that I need to conduct or it is necessary to use more than one command to deal with both the text and the numerical data. In case of textread, e.g., the data seems to be read in the form of a single column cell array, and NONE of the common string search commands seem to work here (contains, strcmp, strfind).
If I use textread:
InText=textread(InputFile,'%q')
the input data is in the form of a 670x1 cell array and none of the search commands find the string I am looking for.
If, instead, I use textscan, as suggested by MATLAB, the ouput is not shown, but MATLAB indicates that it is a 6X1 cell array.
The search commands again will not work.
- How to find the location of a certain srting in the first lines of the text. In case of the file in attachmentm that string would e.g. be 'Volume Average', but the list may include a handful of more options. This will then determine how the code will proceed with further calculations. I tried
- Location of the headers of the numerical data, as in line 5 of the attached file: 'Eta0 (Pa*s)', 'Velocity magnitude (m/s)', 'abs(u2) (m/s)' and so on. This will then tell the program to treat each numerical column accordingly. It is tempting to just read the numerical part and assume the position of each column, but one should not forget that each user may have saved the output file in a different sequence of columns and therefore the best practice is to fully automate the search for values by finding the assigned name for each column. The good news is that the names are fixed and never change their format.
here is the whole script:
clc; format long g; clear all;
% Reading the inout file(s)
[FN,FP] = uigetfile('*.txt','MultiSelect','on');
if isequal(FN,0)
disp('No files selcted');
return
else
disp(['Files selected:', fullfile(FP,FN)]);
end
if class(FN) =='char', FN={FN}; end
FN=FN';
jCounter=0; ParaVal=[];
Text2Find='Volume Average';
for iF=1:size(FN,1)
InFile=FN{iF};
%InText=textscan(InFile,'%q')
InText=textread(InFile,'%q')
%InTab=readtable(InFile,'VariableNamingRule' , 'preserve');
ParaIdx=find(contains(InText,Text2Find))
find(strcmp(InText,'Velocity Magnitude' ))
%strfind(InText,Text2Find)
end
Thanks,
Saeid
3 Comments
Accepted Answer
Star Strider
on 24 Dec 2023
Edited: Star Strider
on 26 Dec 2023
The file appears to contain two duplicate horizontally-concatenated matrices, other than for the first column. This is not an easy file to deal with, however it is possible. I just used fgetl to get tthe first four lines, saving them as a cell array. The fifth line became the header lines for the table, created by readtable and extracting them and then assigning them as variable names turned out to be relatively straightforrward. There are duplicated names and duplicated values, so I separated them into two tables (duplicating the first column from the original table in the second table), since duplicated variable names are not permitted.
Try this —
File = 'Rushton 4 Blad...ge Values.txt';
F = fileread(File)
fidi = fopen(File, 'rt');
k = 1;
while ~feof(fidi)
OneLine = fgetl(fidi);
if strcmp(OneLine(1), '%')
HeaderLine{k,:} = OneLine;
else
break
end
k = k+1;
end
fclose(fidi);
HeaderLine(1:k-2)
HL = k-1;
VN = strsplit(HeaderLine{end}(3:end), ' ');
T1 = readtable(File, 'VariableNamingRule','preserve', 'CommentStyle','%', 'HeaderLines',HL);
T1a = T1(:,1:8);
T1a.Properties.VariableNames = VN(1:8)
VNa = T1a.Properties.VariableNames;
T1b = T1(:,[1 9:end]);
T1b.Properties.VariableNames = VN([1 9:end])
VNb = T1b.Properties.VariableNames;
figure
loglog(T1a{:,1}, T1a{:,2:end}, 'LineWidth',2)
grid
xlabel(VNa{1})
ylabel('Values')
title('T1a')
legend(VNa{2:end}, 'Location','northoutside', 'NumColumns',2)
figure
loglog(T1b{:,1}, T1b{:,2:end}, 'LineWidth',2)
grid
xlabel(VNb{1})
ylabel('Values')
title('T1b')
legend(VNb{2:end}, 'Location','northoutside', 'NumColumns',2)
Check = all(table2array(T1a) == table2array(T1b),'all')
EDIT — (26 Dec 2023 at 18:22)
Added the plots and the ‘Check’ variable, code otherwise unchanged.
The data appear to be the same in both sections of the data (denoted here as ‘T1a’ and ‘T1b’). The result of the all call (the ‘Check’ variable) verifies that they are.
.
2 Comments
Star Strider
on 5 Jan 2024
Happy New Year!
As always, my pleasure!
No worries about the duplication. I just needed to figure out a way to work with it, once I discovered it.
More Answers (1)
Ganesh
on 24 Dec 2023
I understand that you are trying to search for text in your ".txt" and then extract all the columns of the table contained within the ".txt" file.
Assuming that the number of headers are contact, firstly, you would need to seperate the headers from the table.
You can use the following code to extract the file into lines, and split the two files. Then the table will be created dynamically based on the fact that between two columns, there is atleast two white spaces.
InText = textscan(fileID,"%s","Delimiter","\n");
InText = InText{1}; % textscan returns a 1x1 cell array, hence extract all the lines of the file
Headers = join(InText(1:4,:)); % Using join to concatenate into one string
strfind(Headers{1},Text2Find); % Finding index of required substring, returns empty if string is not found
Tabletext = join(InText(5:end,:),"\n"); % Creating a string of the table data
Tablecsv = regexprep(Table{1}, '\s{2,}', ','); % Replacing all the column gaps with commas
Tablecells = split(split(csv_formatted_str,"\n"),","); % Splitting the rows with '\n', and the column of each row using ','
Hope this helps!
0 Comments
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!