Split cell array rows by delimiter (2016b)
2 views (last 30 days)
Show older comments
I have a vertical cell array of char vectors that I want to split into smaller vertical cell arrays based on rows in the array that serve as delimiters. For example,
x = ...
{'LINE1'; ...
'* THIS IS A COMMENT LINE'; ...
'* THERE CAN BE MORE THAN ONE COMMENT LINE'; ...
'LINE2'; ...
'LINE3'};
should be split into
x_split = ...
{{{'LINE1'}}; ...
{'LINE2';'LINE3'}};
where lines starting with '* ' are comment identifiers.
I would like the operation to be as fast as possible so I would like a vectorized approach, perhaps involving cellfun/arrayfun. I can get the indices of the comment lines easily enough using cellfun and strncmp, but I'm not sure how to proceed with the splitting.
2 Comments
Jan
on 27 Jun 2019
You forgot to mention, why the first line is stored as a scalar cell array, while the other 2 are a cell vector. Do you want to join the char vectors by using all blocks of comments as separators?
Accepted Answer
Jan
on 27 Jun 2019
Edited: Jan
on 27 Jun 2019
Let's start with a loop approach to clarify at first, what you exactly want:
C = {'LINE1'; ...
'* THIS IS A COMMENT LINE'; ...
'* THERE CAN BE MORE THAN ONE COMMENT LINE'; ...
'LINE2'; ...
'LINE3'};
limit = [true, strncmp(C, '*', 1).', true]; % no need for the slow cellfun here!
ini = strfind(limit, [true, false]);
fin = strfind(limit, [false, true]) - 1;
n = numel(ini);
Result = cell(n, 1);
for k = 1:n
Result{k} = C(ini(k):fin(k));
end
Now you hope that a vectorized approach or cellfun is faster? I do not think so.
Maybe find(diff()) this is faster than calling strfind twice:
limit = [true, strncmp(C, '*', 1).', true]; % no need for the slow cellfun here!
index = find(diff(limit))
n = numel(index) / 2;
Result = cell(n, 1);
for k = 1:n
Result{k} = C(index(2*k-1):index(2*k)-1);
end
Well, let's try splitapply:
isComment = strncmp(C, '*', 1);
index = zeros(size(C));
index(strfind([true, isComment], [true, false])) = 1;
index = cumsum(index);
index(isComment) = NaN;
Result = splitapply(@(x) {x}, C, index);
This seems to be too complex. mat2cell is more direct:
isCmt = strncmp(C, '*', 1);
limit = [true, isCmt.', true];
ini = strfind(limit, [true, false]);
fin = strfind(limit, [false, true]) - 1;
Rexult = mat2cell(C(~isCmt), (fin - ini + 1).')
Some timings:
C = repmat(C, 10000, 1); % A larger input
% With tic/toc, Matlab 2019a ONLINE:
% STRFIND: 0.084 sec
% FIND(DIFF): 0.091 sec
% SPLITAPPLY: 0.235 sec
% MAT2CELL: 0.046 sec
The timings in the ONLINE machine need not be accurate, so test it locally again.
2 Comments
Jan
on 27 Jun 2019
I've edited the answer and added a splitapply and mat2cell appraoch, which might be considered as "vectorized".
More Answers (0)
See Also
Categories
Find more on Data Type Identification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!