Parsing a text file with multiple data blocks into an indexed structure

Question

Andrew on 14 Dec 2022

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1878282-parsing-a-text-file-with-multiple-data-blocks-into-an-indexed-structure

Commented: Mathieu NOE on 5 Jan 2023

Hi all,

I am working with data from a differential scanning calorimeter (which in my case is being used to analyze data from the phase transitions of thermoplastics) which uses a very clunky software that has limited data export options. Typically, one experiment can generate between 10 to 200 "curves", each of which represent the heating or cooling of the sample at a given rate, and the measured heat flow during that particular segment.

In the past, I have been manually exporting each individual curve one at a time and indexing the data to a variable - generally, the variable is either the heating or cooling rate of the current segment, or some experimental parameter associated with the previous segment, such as the amount of time spent crystallizing the sample at a certain temperature. My general workflow uses the following function star_read.m to load the data from a single segment from a .txt file and index it to a given heating rate, and then smooth the data using a Savitsky-Golay filter with the smoothing window set by "filt'.

function[data] = star_read(x,rate,filt)
opts = detectImportOptions(x);
opts.SelectedVariableNames = {'Heatflow','t','Tr'};
f = readmatrix(x, opts);
data.heat = f(:,1);
data.time = f(:,2);
data.temp = f(:,3);
data.heat = smoothdata(data.heat,'sgolay',filt);
end

This function is used in my working file to import data from each segment and store it in a structure like below, which I later pass to additional functions built for analyzing certain thermal features in the curves:

clear all
close all
%Import data
R     = 8.31345;
filt  = 20;
hm_20(1)      = star_read('PA6 B40 - 200K heating, 51C nucleation 0s.txt',      0,filt);
hm_20(2)      = star_read('PA6 B40 - 200K heating, 51C nucleation 25s.txt',     25,filt);
hm_20(3)      = star_read('PA6 B40 - 200K heating, 51C nucleation 50s.txt',     50,filt);
hm_20(4)      = star_read('PA6 B40 - 200K heating, 51C nucleation 75s.txt',     75,filt);
hm_20(5)      = star_read('PA6 B40 - 200K heating, 51C nucleation 100s.txt',    100,filt);
hm_20(6)      = star_read('PA6 B40 - 200K heating, 51C nucleation 125s.txt',    125,filt);
hm_20(7)      = star_read('PA6 B40 - 200K heating, 51C nucleation 150s.txt',    150,filt);

The .txt files being read in generally look like the example attached here 'PA6 B40 - 100K heating, 51C nucleation 100s.txt'.

I have started a project that will require the collection of thousands of these curves, and I have been trying to find a way to automatically parse and index these files from a single text file, without having to manually export and name each of them (which would likely drive me off the deep end entirely). Instead of manually exporting each segment, they can be exported as a batch into a text file like 'PA11 - text.txt' which is formatting like so:

Curve Name:
  ]10[&Chip 58607 - PA11 - test
Curve Values:
          Index              t             Ts             Tr          Value
                           [s]           [°C]           [°C]           [mW]
              0              0        218.199        220.000      -0.238706
              1          0.001        219.227        220.000      -0.227354
              2          0.002        219.627        220.000      -0.173578
              3          0.003        219.787        220.000      -0.107596
              4          0.004        219.856        220.000      -0.061405
              5          0.005        219.890        220.000     -0.0352538
              6          0.006        219.909        220.000     -0.0212574
              7          0.007        219.921        220.000     -0.0144072
              8          0.008        219.930        220.000     -0.0115749
              9          0.009        219.937        220.000     -0.0111147
             10           0.01        219.942        220.000     -0.0120217
             
             
Curve Name:
  ]9[&Chip 58607 - PA11 - test
Curve Values:
          Index              t             Ts             Tr          Value
                           [s]           [°C]           [°C]           [mW]
              0              0        -58.448        -60.000     -0.0011569
              1         0.0001        -58.454        -59.800    -0.00115686
              2         0.0002        -58.454        -59.600     -0.0011521
              3         0.0003        -58.459        -59.400    -0.00119124
              4         0.0004        -58.459        -59.200    -0.00119119
              5         0.0005        -58.467        -59.000     -0.0011816
              6         0.0006        -58.473        -58.800    -0.00116694
              7         0.0007        -58.473        -58.600    -0.00117134
              8         0.0008        -58.465        -58.400    -0.00115189
              9         0.0009        -58.465        -58.200   -0.000943097
             10          0.001        -58.454        -58.000   -0.000571901

For each segment, the segment number is included in the curve name string and enclosed by backwards square brackets - so for "]9[&Chip 58607 - PA11 - test", this is the 9th segment in the experiment. For each experiment, I am also able to export a "Method" file - I have attached "PA11 - test method.txt" as an example. You can see that the parameters for each segment are stored in the following format and listed as "Segment 1" etc.

             Chip 58607 - PA11 - test                                        14.12.2022 10:54
________________________________________________________________________________
Method           : Chip 58607 - PA11 - test, 07.12.2022 13:30:46
                   Not released
Created by       : METTLER
Chip Sensor Type : MultiSTAR UFS1 (450)
TA Technique     : Flash DSC
Start Temperature: 25.00 °C
End Temperature  : 220.00 °C
Sample
  Range of Weight: 0.00 - 0.10 mg
Segment 1
  Start temp    : 25.00 °C
  End temp      : 220.00 °C
  Heating rate  : 1000.00 K/s
  Sampling freq.: 10 kHz (dt = 0.1 ms)
  Gas           : Air, 0 ml/min
Segment 2
  Temperature   : 220.00 °C
  Duration      : 0.00 min
  Sampling freq.: 1 kHz (dt = 1.0 ms)
  Gas           : Air, 0 ml/min

What I would love to do is to parse the method file, and for each segment heading, extract the segment number, and store the experimental parameters such as "Start temp" and "Heating rate" as doubles in a data structure, which each array in the structure indexed to the segment number. At the same time, I am trying to parse through the txt file containing the experimental data from each run (ie; PA11 - test.txt) and store the experimental data for each segment to a structure that is also indexed to the segment number.

I have been trying to employ fileread, textscan and filescan to get there, but I am very much in over my head. There are a few earlier questions which have answers that are very helpful:

https://www.mathworks.com/matlabcentral/answers/196582-best-way-to-parse-text-file
https://www.mathworks.com/matlabcentral/answers/312599-how-do-i-parse-this-complex-text-file-with-textscan
https://www.mathworks.com/matlabcentral/answers/889302-reorganization-of-experimental-data?s_tid=srchtitle (this one in particular is using text files from the same software, and user Star Strider has given a lot of great advice)

However, I am struggling immensely with implementation on this particular problem. I would sincerely appreciate any help or suggestions!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Mathieu NOE on 15 Dec 2022

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1878282-parsing-a-text-file-with-multiple-data-blocks-into-an-indexed-structure#answer_1128772

hello

I give it a try even though I do not consider myself as "real expert" in file parsing...but why not try...

so here is it : result => in structure S :

there are only 5 fields because I retrieved data from file PA11 - test method only when a curve data has fields "start temp" and "heating rate" ; the tests without those fields are discarded like :

Segment 2

Temperature : 220.00 °C

Duration : 0.00 min

Sampling freq.: 1 kHz (dt = 1.0 ms)

Gas : Air, 0 ml/min

for the time being the field "results" (extract from file PA11 - test ) is only a string for the time being, but maybe you want to have it as a table or ???

that work remains to be done

fileDir = pwd; % choose your working directory
%% 1/ parsing the "PA11 - test method.txt"  file
% What I would love to do is to parse the method file, and for each segment heading,
% extract the segment number, and store the experimental parameters such as 
% "Start temp" and "Heating rate" as doubles in a data structure, 
% which each array in the structure indexed to the segment number.
filename1 = 'PA11 - test method.txt ';
D1=readlines(fullfile(fileDir,filename1));           % read as string array
% find Segments
idx1=find(contains(D1,'Segment'));    % find the Segment lines index
idx2=find(contains(D1,'Start temp'));    % find the Start temp lines index
idx3=find(contains(D1,'Heating rate'));    % find the Heating rate lines index
segment_numbers = str2double(extract(D1(idx2-1), digitsPattern)); %valid segments are those containing "Start temp" field (one line after)
for ci = 1:numel(segment_numbers)
    S(ci).segment_number = segment_numbers(ci);
    S(ci).start_temp = str2double(extractBetween(D1(idx2(ci)),':','°'));
    S(ci).heating_rate = str2double(extractBetween(D1(idx3(ci)),':','K/s'));
end
%% 2/ parsing the "PA11 - test.txt"  data file
% What I would love to do is to parse the method file, and for each segment heading,
% extract the segment number, and store the experimental parameters such as 
% "Start temp" and "Heating rate" as doubles in a data structure, 
% which each array in the structure indexed to the segment number.
filename2 = 'PA11 - test.txt ';  % NB : data (Curve) are organized backwards from bottom to top of file
D2=readlines(fullfile(fileDir,filename2));           % read as string array
nb_lines = numel(D2);
% find indexes
idx4=find(contains(D2,'Curve Name:'));    % find the Segment lines index
curve_number = str2double(extractBetween(D2(idx4+1),']','['));
% blocks data length
tmp = [idx4;nb_lines]; % includes indexes of lines "Curve Name:" and eof line index
start_lines = tmp(1:end-1)+3;
stop_lines = tmp(2:end)-5;
blocks_data_length = stop_lines-start_lines;
for ci = 1:numel(segment_numbers)
    selected(ci) = find(curve_number==segment_numbers(ci)); % we have to it because the data are organized backwards from bottom to top of file
    start_lines_selec(ci) = start_lines(selected(ci));
    stop_lines_selec(ci) = stop_lines(selected(ci));
    D2_extract = D2(start_lines_selec(ci):stop_lines_selec(ci));
    %T(ci) = array2table(D2_extract);
    
    % below is for : At the same time, I am trying to parse through the txt file
    % containing the experimental data from each run (ie; PA11 - test.txt)
    % and store the experimental data for each segment to a structure that is also indexed to the segment number. 
    S(ci).results = D2_extract;
    
end

2 Comments
Show NoneHide None

Andrew on 4 Jan 2023

This solution works extremely well! Exactly what I have been trying to do. Thank you @Mathieu NOE, and sorry for the delay in testing this - I only have a Matlab license for my work PC and have been out of office for the Christmas break.

Mathieu NOE on 5 Jan 2023

hello

no problem, happy new year !

Sign in to comment.

Parsing a text file with multiple data blocks into an indexed structure

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Parsing a text file with multiple data blocks into an indexed structure

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None