Importing ascii data with mixed in headers

I have an old program that creates ascii files with data in a format similar to the below:
Header line
Set 1 Parameter 1 1.0 Parameter 2 1.0 Parameter 3 1.1
0.0000 1.0000 2.0000 3.0000
0.1000 1.0001 2.0001 3.0001
0.2000 1.0002 2.0002 3.0002
Set 2 Parameter 1 2.0 Parameter 2 2.0 Parameter 3 2.1
0.0000 1.0005 2.0005 3.0005
0.1000 1.0006 2.0006 3.0006
0.2000 1.0007 2.0007 3.0007
This pattern repeats for several more sets, and is generally much larger than the simplified form I have presented.
I would like to extract the data from each set, as well as the values of the parameters for each set, but am having trouble doing so. I could go line by line with fgetl, but this doesn't seem particularly efficient in this case. All the bulk data readers that I can think of have problems with the mixed data format created by the parameter lines. Is there a way to extract the information I want without having to pass through the file at each individual line?

2 Comments

Does the output have to be broken up by block, or can it be all together as if the Set lines were not there?
It would be ideal to have the outputs broken by block, but I could probably manage that separately if necessary.

Sign in to comment.

 Accepted Answer

S = fileread('YourFile.txt');
bpos = regexp(S, '^Set', 'start', 'lineanchors');
blocks = mat2cell(S, 1, diff([1 bpos length(S)]));
blocks(1) = [];
fmt = repmat('%f', 1, 4);
data = cellfun(@(B) textscan(B, fmt, 'HeaderLines', 1, 'CollectOutput', 1), blocks);
data is now a cell array of numeric arrays.
The code does assume that each block has the same number of columns, but it does not assume that the blocks have the same number of rows.

5 Comments

Cool, that worked with a minor change:
blocks = mat2cell(S, 1, [1 diff([1 bpos length(S)])]);
Now I have the data in blocks. Any suggestions of how to get the parameters extracted?
I think it should be
blocks = mat2cell(S, 1, diff([1 bpos length(S)+1]));
... which I know I had typed into the command window, but I guess I copied an older version.
pstr = regexp(blocks, '(?<=Parameter\s+\d+\s+)\S+', 'match');
params = str2double( vertcat(pstr{:}) );
This does not assume that there are three parameters, but does assume that there are the same number for each block.
Cool, that took care of it. I had to do a bit of finessing to make it fit what the actual expression was, but it got me right on track.
Is it possible to pull multiple parameters with different labels using a single regexp? I didn't see an example like that in the documentation for it. An example of what I mean could be like the following.
pstr = regexp(blocks, '(?<=ParamA=\s+)\S+(\s+?<=ParamB=\s+)\S+', 'match');
As I understand it this would look and identify both numbers, but would keep them in the same output, which I could understand if it just got messy. Alternatively,
pstr = regexp(blocks, '(?<=ParamA=\s+)\S+','(\s+?<=ParamB=\s+)\S+', 'match');
would individually look for the two different parameter entries, and ideally would produce two individual results stored in a matrix.
I suggest you look at the regexp named token facility and the 'names' option.
ptokens = regexp(blocks, '(?<=ParamA=\s+)(?<ParamA>\S+).*(?<=ParamB=\s+)(?<ParamB>(\S+)', 'names');
This would produce a struct array with fields ParamA and ParamB that held the relevant content.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!