regexprep incorrect multiple replacement
5 views (last 30 days)
Show older comments
Let's say we have the following char vector as input:
str = 'abc(1,2,3)';
I would like to replace '1','2' and '3' with different numbers.
Let's say I want to replace the numbers with the following numbers:
rep = '{'5';'8';'3'};
My desired output is:
str = 'abc(5,8,3)';
The format for using regexprep is:
regexprep(str,expression,replace)
I have tried to solve the problem in two ways:
- One expression.
expression = '\d';
replace = {'5';'2';'3'};
regexprep(str,expression,replace)
ans = 'abc(3,3,3)'
The output is incorrect, despite the documentation stating:
If replace is a cell array of N character vectors and expression is a single character vector, then regexprep attempts N matches and replacements.
- Multiple expressions.
expression = {'\d';'\d';'\d'};
replace = {'5';'2';'3'};
regexprep(str,expression,replace)
ans = 'abc(3,3,3)'
The output for the second case is incorrect, despite the documentation stating:
If both replace and expression are cell arrays of character vectors, then they must contain the same number of elements. regexprep pairs each replace element with its corresponding element in expression.
In both cases regexprep is replacing all three matches using only the last value from the replace cell array, rather than all three.
What am I missing?
2 Comments
Stephen23
on 5 Jun 2018
Edited: Stephen23
on 5 Jun 2018
"The output is incorrect, despite the documentation stating:..."
"What am I missing?"
The output is correct in both cases. The documentation states that it "...attempts N matches and replacements": so it matches the digits and replaces them with cell one, then it starts afresh and matches the digits and replaces them with cell 2, then it starts afresh and matches the digits and replaces them with cell 3. Which is exactly the output you are getting.
Each time regexp starts parsing the string from the start again, whereas you assumed that it starts from where it finished replacing the last string. To get the behavior you want you will have to add a dynamic expression of some kind.
Accepted Answer
Walter Roberson
on 5 Jun 2018
regexprep (S, {A, B }, { P, Q })
is the same as
regexprep( regexprep(S, A, P), B, Q)
That is, the first pair is applied to the entire string, and the second pair is applied to the string that results.
It appears to you that only the third was done because your replacement text happens to match the second and third pattern and got rereplaced.
The 'once' option will not solve the problem.
3 Comments
Walter Roberson
on 5 Jun 2018
str = 'abc(1,2,3)';
regexprep(str, '\d+(\D+)\d+(\D+)\d+', '5$18$23')
The $1 in the replacement pattern matches the first () expression, the $2 matches the second () expression. So we match one or more digits, then remember the sequence of non-digits that follows that, then match another series of digits, then remember the sequence of non-digits that follows that, then match another series of digits. And we replace that all with fixed text followed by the first remembered series of non-digits, then fixed text followed by the second remembered series of non-digits, then more fixed text.
More Answers (0)
See Also
Categories
Find more on Characters and Strings in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!