How do I exclude certain lines from data files?

Question

Stanley on 23 Oct 2018

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/425642-how-do-i-exclude-certain-lines-from-data-files

Commented: Stanley on 17 Jan 2019

I am trying to extract numerical values from hexadecimal values which are generated and stored in the form of .csv data files. However, the format of these .csv data files was recently altered by an update, such that the code no longer works as it relies on the detection of a keyword, in this case 'CUSTOM_MODE_STEP', in order to pick out the relevant lines in the data file. For context, the original data format is thus:

CUSTOM_MODE_STEP_0 = { 0x40   0x43   0x7D   0xAF   0x96   0xA2   0x00   0x00 }
CUSTOM_MODE_STEP_1 = { 0x00   0xF4   0x7D   0xAF   0x96   0xA2   0x01   0x00 }
CUSTOM_MODE_STEP_2 = { 0x7C   0x00   0x7D   0xAF   0x96   0xA2   0x02   0x80 }

However, the format was changed, so that the hexadecimal-containing lines in the data files are now separated by strings of text which obviously cannot be read:

CUSTOM_MODE_STEP_0 = { 0x40   0x43   0x7D   0xA2   0xA2   0xA2   0x00   0x00 }        
CUSTOM_MODE_STEP_0_DESCRIPTION = [text]
CUSTOM_MODE_STEP_1 = { 0x00   0xF4   0x7D   0xA2   0xA2   0xA2   0x01   0x00 }        
CUSTOM_MODE_STEP_1_DESCRIPTION = [text]
CUSTOM_MODE_STEP_2 = { 0x7C   0x00   0x7D   0xA2   0xA2   0xA2   0x02   0x80 }  
      
CUSTOM_MODE_STEP_2_DESCRIPTION = [text]

I am using a pre-written script, and I am trying to edit it so that it can accommodate this change. The script is below:

                    for f=fields'
                        if contains(f,'CUSTOM_MODE_STEP')
                            ht =  DataN.Periph.(char(f));
                            list = strsplit(ht,{',', '{', '}'});
                            DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
                            DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
                            DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
                            DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
                            DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))]; 
                            DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];  
                        end
                    end

The variable 'fields' is a 27x1 array, of which the CUSTOM_MODE_STEP variables (both hexadecimal and text values) are present within.

I was thinking of inserting an elseif statement like:

elseif contains(f,'DESCRIPTION')

but I'm unsure as to what command to use exactly to exclude those lines. I've also thought about referencing the correct cells in that array using fields{} but that hasn't worked:

f=fields{17),fields{19},fields{21};

Those numbers being the coordinates for the hexadecimal lines.

Any further information needed please let me know.

1 Comment
Show -1 older commentsHide -1 older comments

Stanley on 17 Jan 2019

in the end I found a very simple solution which was to simply alter the expression here:

for f=fields'
                    if contains(f,'CUSTOM_MODE_STEP_(\d+)\s+')
                        ht =  DataN.Periph.(char(f));
                        list = strsplit(ht,{',', '{', '}'});
                        DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
                        DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
                        DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
                        DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
                        DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))]; 
                        DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
                    end
end

So all I did was append part of the relevant metacharacters from the longer regular expression in Guillaume's answer below and can confirm that it works for multiple files (>100 in number).

Sign in to comment.

Sign in to answer this question.

Answer 1

Guillaume on 23 Oct 2018

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/425642-how-do-i-exclude-certain-lines-from-data-files#answer_342954

Open in MATLAB Online

It sounds like your original code is very fragile. Looking at the portion you show, it's also not very efficient since there's a lot of array resizing. A single regexp, a call to sscanf and a bit of cell array manipulation is probably all that is needed to get the data you want.

It would be useful to have an example text file to validate against. With the attached file, based on your example data, this is the code I'd use:

filecontent = fileread('test.csv');  %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});

If you then want to convert that into a table with the same variable names as your original structure:

steps = array2table([stepnumber, stepvalues(:, 1:6)], 'VariableNames', {'Step', 't_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'})

3 Comments
Show 1 older commentHide 1 older comment

Stanley on 23 Oct 2018

Open in MATLAB Online

example750.csv

Hello Guillaume,

Thanks for responding.

Attached is an example.csv file which I've truncated.

Below is a more complete section of the code.

elseif contains(sample,'750')
                    fields = fieldnames(DataN.Periph);
                    steps = contains(fields,'CUSTOM_MODE_STEP');
                    DataN.UID = DataN.Periph.UID(1:end-2);
                    DataN.UID_880 = DataN.Periph.UID_DGPxxx;
                    DataN.position = DataN.Periph.UID(end-1:end); 
                    DataN.index = count;
                      DataN.ht_1 =[]; DataN.ht_2 =[]; DataN.ht_3 =[]; DataN.ht_4 =[];
                      DataN.t_1 =[]; DataN.t_2 =[];
                      for f=fields'
                          if contains(f,'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}')
                              ht =  DataN.Periph.(char(f));
                              list = strsplit(ht,{',', '{', '}'});
                              DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
                              DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
                              DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
                              DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
                              DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))]; 
                              DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];  
                          end
                      end
                      DataN.R0=DataN.R; 
                      DataS(count)=DataN;
                      count = count+1;
                  end

As you can see I had a stab at replacing CUSTOM_MODE_STEP with the regular expression, which I guess was what I was really after. I assumed those operators will skip the 'DESCRIPTION' variables, but it seems as though using that as the input will skip all of the interim hex2dec code and cut straight to end.

Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script.

Guillaume on 8 Jan 2019

Open in MATLAB Online

The only change that needs to be made to my original code, to account for the additional , separating the hex values in your latest example, is to replace the '0x%x ' in the sscanf call by '0x%x, ', so:

filecontent = fileread('test.csv');  %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x, ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});

"Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script"

While I can understand the resistance, the way you have it coded at present, the input format, parsing and creating of output data are all deeply interlinked. As you've found out, if the file format change you need to review everything. I would think that changing the design now would result in a lot less pain later. If it were me, I would write a parser that would be even more generic than the above (store the parsed data as key/values pairs) and afterward just look up the required keys.

Anyway, it is trivial to convert the output of the above into your original structure:

fnames = {'t_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'};
namevalues = [fnames; num2cell(stepvalues(:, 1:6), 1)];
dataN = struct(namevalues{:})

Stanley on 17 Jan 2019

Edited: Stanley on 17 Jan 2019

Appreciate the help Guillaume. It turns out that the solution was very trivial (which I should have figured out much sooner but it's a learning process) - but while trying to adapt your code I did learn about concatenation and regular expressions, so it was worthwhile. I've started another script in any case with your code so I can work on it every now and then.

Sign in to comment.

Answer 2

per isakson on 4 Jan 2019

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/425642-how-do-i-exclude-certain-lines-from-data-files#answer_354902

Open in MATLAB Online

I downloaded example750.csv and tried a different approach of extracting and converting the hex-values

>> cssm( 'example750.csv' )
ans =
    64    67   125   162   162   162     0     0
     0   244   125   162   162   162     1     0
   124     0   125   162   162   162     2   128
   

where

function    out = cssm( ffs )
    %
    %%  Read the file to a cell array of character rows
    fid = fopen( ffs, 'r' );
    cac = textscan( fid, '%[^\r\n]' );
    cac = cac{1};
    [~] = fclose( fid );
    %%  Extract the rows with hex values
    pos = regexp( cac, 'CUSTOM_MODE_STEP_\d+\s+=\s+\{' );
    cac( cellfun( @isempty, pos ) ) = [];
    %%  Extract the hex values, which are two characters following "0x"
    hex = regexp( cac, '(?<=0x)[A-F\d]{2}', 'match' );
    %%  Convert to dec values. (hex2dec returns a column, thus reshape.)
    dec = cellfun( @(c) reshape(hex2dec(c),1,[]), hex, 'uni',false );
    out = cell2mat( dec );
end

1 Comment
Show -1 older commentsHide -1 older comments

Stanley on 17 Jan 2019

Thanks Per, I have yet to test out this code myself, but will definitely try to. I am going to work with Guillaume's code first.

Sign in to comment.

How do I exclude certain lines from data files?

1 Comment
Show -1 older commentsHide -1 older comments

Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

How do I exclude certain lines from data files?

1 Comment Show -1 older commentsHide -1 older comments

Answers (2)

3 Comments Show 1 older commentHide 1 older comment

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

3 Comments
Show 1 older commentHide 1 older comment

1 Comment
Show -1 older commentsHide -1 older comments