Regular expression help for capturing tokens from a c++ if_else function block
1 view (last 30 days)
Show older comments
I am trying to convert some c++ code matlab code and need help because trying to capture the conditions on the if_else statements. The sample code is posted below, normally it is much longer and contains many sets of the same repeating piecewise constraints with different functions to evaluate.
T[4][0] = if (x <= 2.0E2) {
t4 = x*-7.43939368315E2;
} else {
if (2.0E4 < x) {
t4 = x*-3.15202357052E2;
} else {
if ((2.0E2 < x) && (x <= 1.0E3)) {
t4 = -x*(x*-3.3581574807E-2+log(x)*4.7023733515E1+(log(x)*2.400503067842E3)/x+1.09445880410878E5/x+1.0/(x*x)*6.1158591370349E4+(x*x)*1.9946376054E-5-(x*x*x)*7.445595608E-9+(x*x*x*x)*1.243670758E-12-1.11582896139E2);
} else {
if ((1.0E3 < x) && (x <= 6.0E3)) {
t4 = -x*(x*-2.3265021934E-3+log(x)*4.8602554649E1+(log(x)*1.5974720059903E4)/x+3.623374787802E4/x+1.0/(x*x)*1.897212861710375E6+(x*x)*1.9151215358E-7-(x*x*x)*1.2237095959E-11+(x*x*x*x)*3.95116007E-16-1.62570932876E2);
} else {
if ((6.0E3 < x) && (x <= 2.0E4)) {
t4 = -x*(x*-1.6249864554E-1+log(x)*2.049900452325E3+(log(x)*6.16116287553448E6)/x-4.067298995421662E7/x+1.0/(x*x)*(5.126459124735035E23/1.40737488355328E14)+(x*x)*4.514672712E-6-(x*x*x)*9.025238189E-11+(x*x*x*x)*8.209541318E-16-1.8977498210781E4);
} else {
t4 = NAN;
}
}
}
}
};
T[5][0] = if (x <= 2.0E2) {
t5 = x*-6.99993596709E2;
} else {
if (6.0E3 < x) {
t5 = x*-2.95957496736E2;
} else {
if ((2.0E2 < x) && (x <= 1.0E3)) {
t5 = x*(x*-5.9732340052E-2+log(x)*1.344708739E1+(log(x)*6.623440520686E3)/x-1.30277758259044E5/x+1.0/(x*x)*1.93975305124744E5+(x*x)*2.359195784E-5-(x*x*x)*7.135910089E-9+(x*x*x*x)*1.0512057259E-12-2.8910827165E2);
} else {
if ((1.0E3 < x) && (x <= 6.0E3)) {
t5 = -x*(x*1.8859601673E-5+log(x)*3.6779467978E1+(log(x)*2.62795681346E2)/x+1.11227336090838E5/x-1.0/(x*x)*7.24982080986207E5+(x*x)*4.872002157E-8-(x*x*x)*9.084382373E-12+(x*x*x*x)*6.625791502E-16-4.3668943967E1);
} else {
t5 = NAN;
}
}
}
}
I have tried (and other variations)
' tokens = regexp(funcode,'if\s\((.+)\)\s\{','tokens') '
but it captures the whole segment after the first 'if (' and ends with the last ') {'
I would also like to eventually capture tokens for the expressions for 't4 = ... ' etc with each condition.
Any help would be greatly appreciated. P.S. Matlab needs to make MatlabFunction() work for piecewise symbolic functions.
Accepted Answer
Walter Roberson
on 12 Jul 2011
Your most immediate problem is that .+ captures as many characters as possible and then backtracks only as much as is necessary to match the rest of the expression. If you use .+? then that will capture only as many characters as are necessary to match the rest of the expression.
However, you have a deeper problem that you really only want to stop when you encounter the balancing ')'. Determining whether a delimiter is balanced or not is something that is known to not be theoretically possible in pure regular expressions. MATLAB's "regular expressions" are, though, extensions to the standard regular expressions. MATLAB's expressions have much in common with Perl's "regular expressions", and it is possible in Perl to find the balancing delimiter. It has been a number of years since I looked at the relevant (tricky) Perl code; I think it is possible in the regular expressions that MATLAB provides, but I would not want to try to reinvent the technique -- too ugly and hard to debug.
The easiest thing to do might be to use MATLAB's perl() command to call a perl routine to do the parsing for you, having looked in the Perl FAQ to find the mechanism.
More Answers (1)
See Also
Categories
Find more on Symbolic Math Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!