How do I exclude certain lines from data files?

4 views (last 30 days)
Stanley
Stanley on 23 Oct 2018
Commented: Stanley on 17 Jan 2019
I am trying to extract numerical values from hexadecimal values which are generated and stored in the form of .csv data files. However, the format of these .csv data files was recently altered by an update, such that the code no longer works as it relies on the detection of a keyword, in this case 'CUSTOM_MODE_STEP', in order to pick out the relevant lines in the data file. For context, the original data format is thus:
CUSTOM_MODE_STEP_0 = { 0x40 0x43 0x7D 0xAF 0x96 0xA2 0x00 0x00 }
CUSTOM_MODE_STEP_1 = { 0x00 0xF4 0x7D 0xAF 0x96 0xA2 0x01 0x00 }
CUSTOM_MODE_STEP_2 = { 0x7C 0x00 0x7D 0xAF 0x96 0xA2 0x02 0x80 }
However, the format was changed, so that the hexadecimal-containing lines in the data files are now separated by strings of text which obviously cannot be read:
CUSTOM_MODE_STEP_0 = { 0x40 0x43 0x7D 0xA2 0xA2 0xA2 0x00 0x00 }
CUSTOM_MODE_STEP_0_DESCRIPTION = [text]
CUSTOM_MODE_STEP_1 = { 0x00 0xF4 0x7D 0xA2 0xA2 0xA2 0x01 0x00 }
CUSTOM_MODE_STEP_1_DESCRIPTION = [text]
CUSTOM_MODE_STEP_2 = { 0x7C 0x00 0x7D 0xA2 0xA2 0xA2 0x02 0x80 }
CUSTOM_MODE_STEP_2_DESCRIPTION = [text]
I am using a pre-written script, and I am trying to edit it so that it can accommodate this change. The script is below:
for f=fields'
if contains(f,'CUSTOM_MODE_STEP')
ht = DataN.Periph.(char(f));
list = strsplit(ht,{',', '{', '}'});
DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))];
DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
end
end
The variable 'fields' is a 27x1 array, of which the CUSTOM_MODE_STEP variables (both hexadecimal and text values) are present within.
I was thinking of inserting an elseif statement like:
elseif contains(f,'DESCRIPTION')
but I'm unsure as to what command to use exactly to exclude those lines. I've also thought about referencing the correct cells in that array using fields{} but that hasn't worked:
f=fields{17),fields{19},fields{21};
Those numbers being the coordinates for the hexadecimal lines.
Any further information needed please let me know.
  1 Comment
Stanley
Stanley on 17 Jan 2019
in the end I found a very simple solution which was to simply alter the expression here:
for f=fields'
if contains(f,'CUSTOM_MODE_STEP_(\d+)\s+')
ht = DataN.Periph.(char(f));
list = strsplit(ht,{',', '{', '}'});
DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))];
DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
end
end
So all I did was append part of the relevant metacharacters from the longer regular expression in Guillaume's answer below and can confirm that it works for multiple files (>100 in number).

Sign in to comment.

Answers (2)

Guillaume
Guillaume on 23 Oct 2018
It sounds like your original code is very fragile. Looking at the portion you show, it's also not very efficient since there's a lot of array resizing. A single regexp, a call to sscanf and a bit of cell array manipulation is probably all that is needed to get the data you want.
It would be useful to have an example text file to validate against. With the attached file, based on your example data, this is the code I'd use:
filecontent = fileread('test.csv'); %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});
If you then want to convert that into a table with the same variable names as your original structure:
steps = array2table([stepnumber, stepvalues(:, 1:6)], 'VariableNames', {'Step', 't_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'})
  3 Comments
Guillaume
Guillaume on 8 Jan 2019
The only change that needs to be made to my original code, to account for the additional , separating the hex values in your latest example, is to replace the '0x%x ' in the sscanf call by '0x%x, ', so:
filecontent = fileread('test.csv'); %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x, ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});
"Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script"
While I can understand the resistance, the way you have it coded at present, the input format, parsing and creating of output data are all deeply interlinked. As you've found out, if the file format change you need to review everything. I would think that changing the design now would result in a lot less pain later. If it were me, I would write a parser that would be even more generic than the above (store the parsed data as key/values pairs) and afterward just look up the required keys.
Anyway, it is trivial to convert the output of the above into your original structure:
fnames = {'t_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'};
namevalues = [fnames; num2cell(stepvalues(:, 1:6), 1)];
dataN = struct(namevalues{:})
Stanley
Stanley on 17 Jan 2019
Edited: Stanley on 17 Jan 2019
Appreciate the help Guillaume. It turns out that the solution was very trivial (which I should have figured out much sooner but it's a learning process) - but while trying to adapt your code I did learn about concatenation and regular expressions, so it was worthwhile. I've started another script in any case with your code so I can work on it every now and then.

Sign in to comment.


per isakson
per isakson on 4 Jan 2019
I downloaded example750.csv and tried a different approach of extracting and converting the hex-values
>> cssm( 'example750.csv' )
ans =
64 67 125 162 162 162 0 0
0 244 125 162 162 162 1 0
124 0 125 162 162 162 2 128
where
function out = cssm( ffs )
%
%% Read the file to a cell array of character rows
fid = fopen( ffs, 'r' );
cac = textscan( fid, '%[^\r\n]' );
cac = cac{1};
[~] = fclose( fid );
%% Extract the rows with hex values
pos = regexp( cac, 'CUSTOM_MODE_STEP_\d+\s+=\s+\{' );
cac( cellfun( @isempty, pos ) ) = [];
%% Extract the hex values, which are two characters following "0x"
hex = regexp( cac, '(?<=0x)[A-F\d]{2}', 'match' );
%% Convert to dec values. (hex2dec returns a column, thus reshape.)
dec = cellfun( @(c) reshape(hex2dec(c),1,[]), hex, 'uni',false );
out = cell2mat( dec );
end
  1 Comment
Stanley
Stanley on 17 Jan 2019
Thanks Per, I have yet to test out this code myself, but will definitely try to. I am going to work with Guillaume's code first.

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!