Parsing a text file with multiple data blocks into an indexed structure

10 views (last 30 days)
Hi all,
I am working with data from a differential scanning calorimeter (which in my case is being used to analyze data from the phase transitions of thermoplastics) which uses a very clunky software that has limited data export options. Typically, one experiment can generate between 10 to 200 "curves", each of which represent the heating or cooling of the sample at a given rate, and the measured heat flow during that particular segment.
In the past, I have been manually exporting each individual curve one at a time and indexing the data to a variable - generally, the variable is either the heating or cooling rate of the current segment, or some experimental parameter associated with the previous segment, such as the amount of time spent crystallizing the sample at a certain temperature. My general workflow uses the following function star_read.m to load the data from a single segment from a .txt file and index it to a given heating rate, and then smooth the data using a Savitsky-Golay filter with the smoothing window set by "filt'.
function[data] = star_read(x,rate,filt)
opts = detectImportOptions(x);
opts.SelectedVariableNames = {'Heatflow','t','Tr'};
f = readmatrix(x, opts);
data.heat = f(:,1);
data.time = f(:,2);
data.temp = f(:,3);
data.heat = smoothdata(data.heat,'sgolay',filt);
end
This function is used in my working file to import data from each segment and store it in a structure like below, which I later pass to additional functions built for analyzing certain thermal features in the curves:
clear all
close all
%Import data
R = 8.31345;
filt = 20;
hm_20(1) = star_read('PA6 B40 - 200K heating, 51C nucleation 0s.txt', 0,filt);
hm_20(2) = star_read('PA6 B40 - 200K heating, 51C nucleation 25s.txt', 25,filt);
hm_20(3) = star_read('PA6 B40 - 200K heating, 51C nucleation 50s.txt', 50,filt);
hm_20(4) = star_read('PA6 B40 - 200K heating, 51C nucleation 75s.txt', 75,filt);
hm_20(5) = star_read('PA6 B40 - 200K heating, 51C nucleation 100s.txt', 100,filt);
hm_20(6) = star_read('PA6 B40 - 200K heating, 51C nucleation 125s.txt', 125,filt);
hm_20(7) = star_read('PA6 B40 - 200K heating, 51C nucleation 150s.txt', 150,filt);
The .txt files being read in generally look like the example attached here 'PA6 B40 - 100K heating, 51C nucleation 100s.txt'.
I have started a project that will require the collection of thousands of these curves, and I have been trying to find a way to automatically parse and index these files from a single text file, without having to manually export and name each of them (which would likely drive me off the deep end entirely). Instead of manually exporting each segment, they can be exported as a batch into a text file like 'PA11 - text.txt' which is formatting like so:
Curve Name:
]10[&Chip 58607 - PA11 - test
Curve Values:
Index t Ts Tr Value
[s] [°C] [°C] [mW]
0 0 218.199 220.000 -0.238706
1 0.001 219.227 220.000 -0.227354
2 0.002 219.627 220.000 -0.173578
3 0.003 219.787 220.000 -0.107596
4 0.004 219.856 220.000 -0.061405
5 0.005 219.890 220.000 -0.0352538
6 0.006 219.909 220.000 -0.0212574
7 0.007 219.921 220.000 -0.0144072
8 0.008 219.930 220.000 -0.0115749
9 0.009 219.937 220.000 -0.0111147
10 0.01 219.942 220.000 -0.0120217
Curve Name:
]9[&Chip 58607 - PA11 - test
Curve Values:
Index t Ts Tr Value
[s] [°C] [°C] [mW]
0 0 -58.448 -60.000 -0.0011569
1 0.0001 -58.454 -59.800 -0.00115686
2 0.0002 -58.454 -59.600 -0.0011521
3 0.0003 -58.459 -59.400 -0.00119124
4 0.0004 -58.459 -59.200 -0.00119119
5 0.0005 -58.467 -59.000 -0.0011816
6 0.0006 -58.473 -58.800 -0.00116694
7 0.0007 -58.473 -58.600 -0.00117134
8 0.0008 -58.465 -58.400 -0.00115189
9 0.0009 -58.465 -58.200 -0.000943097
10 0.001 -58.454 -58.000 -0.000571901
For each segment, the segment number is included in the curve name string and enclosed by backwards square brackets - so for "]9[&Chip 58607 - PA11 - test", this is the 9th segment in the experiment. For each experiment, I am also able to export a "Method" file - I have attached "PA11 - test method.txt" as an example. You can see that the parameters for each segment are stored in the following format and listed as "Segment 1" etc.
Chip 58607 - PA11 - test 14.12.2022 10:54
________________________________________________________________________________
Method : Chip 58607 - PA11 - test, 07.12.2022 13:30:46
Not released
Created by : METTLER
Chip Sensor Type : MultiSTAR UFS1 (450)
TA Technique : Flash DSC
Start Temperature: 25.00 °C
End Temperature : 220.00 °C
Sample
Range of Weight: 0.00 - 0.10 mg
Segment 1
Start temp : 25.00 °C
End temp : 220.00 °C
Heating rate : 1000.00 K/s
Sampling freq.: 10 kHz (dt = 0.1 ms)
Gas : Air, 0 ml/min
Segment 2
Temperature : 220.00 °C
Duration : 0.00 min
Sampling freq.: 1 kHz (dt = 1.0 ms)
Gas : Air, 0 ml/min
What I would love to do is to parse the method file, and for each segment heading, extract the segment number, and store the experimental parameters such as "Start temp" and "Heating rate" as doubles in a data structure, which each array in the structure indexed to the segment number. At the same time, I am trying to parse through the txt file containing the experimental data from each run (ie; PA11 - test.txt) and store the experimental data for each segment to a structure that is also indexed to the segment number.
I have been trying to employ fileread, textscan and filescan to get there, but I am very much in over my head. There are a few earlier questions which have answers that are very helpful:
However, I am struggling immensely with implementation on this particular problem. I would sincerely appreciate any help or suggestions!

Accepted Answer

Mathieu NOE
Mathieu NOE on 15 Dec 2022
hello
I give it a try even though I do not consider myself as "real expert" in file parsing...but why not try...
so here is it : result => in structure S :
there are only 5 fields because I retrieved data from file PA11 - test method only when a curve data has fields "start temp" and "heating rate" ; the tests without those fields are discarded like :
Segment 2
Temperature : 220.00 °C
Duration : 0.00 min
Sampling freq.: 1 kHz (dt = 1.0 ms)
Gas : Air, 0 ml/min
for the time being the field "results" (extract from file PA11 - test ) is only a string for the time being, but maybe you want to have it as a table or ???
that work remains to be done
fileDir = pwd; % choose your working directory
%% 1/ parsing the "PA11 - test method.txt" file
% What I would love to do is to parse the method file, and for each segment heading,
% extract the segment number, and store the experimental parameters such as
% "Start temp" and "Heating rate" as doubles in a data structure,
% which each array in the structure indexed to the segment number.
filename1 = 'PA11 - test method.txt ';
D1=readlines(fullfile(fileDir,filename1)); % read as string array
% find Segments
idx1=find(contains(D1,'Segment')); % find the Segment lines index
idx2=find(contains(D1,'Start temp')); % find the Start temp lines index
idx3=find(contains(D1,'Heating rate')); % find the Heating rate lines index
segment_numbers = str2double(extract(D1(idx2-1), digitsPattern)); %valid segments are those containing "Start temp" field (one line after)
for ci = 1:numel(segment_numbers)
S(ci).segment_number = segment_numbers(ci);
S(ci).start_temp = str2double(extractBetween(D1(idx2(ci)),':','°'));
S(ci).heating_rate = str2double(extractBetween(D1(idx3(ci)),':','K/s'));
end
%% 2/ parsing the "PA11 - test.txt" data file
% What I would love to do is to parse the method file, and for each segment heading,
% extract the segment number, and store the experimental parameters such as
% "Start temp" and "Heating rate" as doubles in a data structure,
% which each array in the structure indexed to the segment number.
filename2 = 'PA11 - test.txt '; % NB : data (Curve) are organized backwards from bottom to top of file
D2=readlines(fullfile(fileDir,filename2)); % read as string array
nb_lines = numel(D2);
% find indexes
idx4=find(contains(D2,'Curve Name:')); % find the Segment lines index
curve_number = str2double(extractBetween(D2(idx4+1),']','['));
% blocks data length
tmp = [idx4;nb_lines]; % includes indexes of lines "Curve Name:" and eof line index
start_lines = tmp(1:end-1)+3;
stop_lines = tmp(2:end)-5;
blocks_data_length = stop_lines-start_lines;
for ci = 1:numel(segment_numbers)
selected(ci) = find(curve_number==segment_numbers(ci)); % we have to it because the data are organized backwards from bottom to top of file
start_lines_selec(ci) = start_lines(selected(ci));
stop_lines_selec(ci) = stop_lines(selected(ci));
D2_extract = D2(start_lines_selec(ci):stop_lines_selec(ci));
%T(ci) = array2table(D2_extract);
% below is for : At the same time, I am trying to parse through the txt file
% containing the experimental data from each run (ie; PA11 - test.txt)
% and store the experimental data for each segment to a structure that is also indexed to the segment number.
S(ci).results = D2_extract;
end
  2 Comments
Andrew
Andrew on 4 Jan 2023
This solution works extremely well! Exactly what I have been trying to do. Thank you @Mathieu NOE, and sorry for the delay in testing this - I only have a Matlab license for my work PC and have been out of office for the Christmas break.

Sign in to comment.

More Answers (0)

Categories

Find more on Programming in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!