Reduction Operations Supported for Automatic Parallelization of
for
loops
The code generator automatically parallelizes for
loops by
converting implicit and explicit sequential forloop code blocks into parallelized code
blocks. Parallelization of a section of code might significantly improve the execution
speed of the generated code. See How parforLoops Improve Execution Speed.
Parallelize for
loops Performing Reduction Operations
You can parallelize for
loops performing reduction operations
by using the configuration option Optimize reductions.
To enable automatic parallelization of these for
loops:
Open the MATLAB^{®} Coder™ app.
On the Generate Code page, click More Settings.
On the Speed tab, select the Enable automatic parallelization and Optimize reductions check boxes.
Optimize reductions is also enabled if you set the Leverage target hardware instruction set extensions parameter to an instruction set that your processor supports.
To enable the configuration option OptimizeReductions
by using
the commandline interface, run these commands.
cfg = coder.config('lib');
cfg.EnableAutoParallelization = true;
cfg.OptimizeReductions = true;
For example, write a MATLAB function arraySum
that calculates the sum of
elements of arrays in1
and sum
, and returns
the reduction variable
out
.
function out = arraySum(in1,a,b) sum = 0; c = zeros(numel(in1),1); for i2 = 1:numel(in1) if i2 > in1(i2) sum = sum + in1(i2); c(i2) = a(i2) + b(i2); end end out = sum + mean(c); end
At the MATLAB command line, run this codegen
command.
arr = 1:1000; codegen arraySum config cfg args {arr,arr,arr} report
Code generation successful: View report
Open the code generation report to see the parallelized
for
loop that performs the addition
operation.
sum = 0.0; #pragma omp parallel num_threads(omp_get_max_threads()) private(sumPrime, d) { sumPrime = 0.0; #pragma omp for nowait for (i2 = 0; i2 < 1000; i2++) { c[i2] = 0.0; d = in1[i2]; if ((double)i2 + 1.0 > d) { sumPrime += d; c[i2] = a[i2] + b[i2]; } } omp_set_nest_lock(&autoparExample_nestLockGlobal); { sum += sumPrime; } omp_unset_nest_lock(&autoparExample_nestLockGlobal); }
MATLAB Functions Supported for Reduction Operations
A reduction operation reduces specific dimensions of an input to a scalar value. A
reduction operation must be associative and commutative. This table lists the
MATLAB functions that are supported as reduction operations and are
parallelized in generated code, where X
is the reduction variable
and expr
is a MATLAB expression. The reduction variable X
can appear on
both sides of an assignment statement.
MATLAB Function  Usage Notes 

plus 

minus 

times 

max 

min 

sum 

prod 

or 

and 

bitand 

bitor 

bitxor 

Note
The Support nonfinite numbers
(SupportNonFinite
) property supports code generation only
for standalone libraries (lib
, dll
) and
executables.
The following example shows a typical usage of a reduction variable
X
.
X = 0; % Initialize X for i = 1:n X = X + d(i); end
This loop is equivalent to the following, where you calculate each
d(i)
in a different
iteration.
X = X + d(1) + ... + d(n)
Handling Overflow in Automatic Parallelization of for
loops
Enabling automatic parallelization of for
loops and reduction
optimization might produce different results due to overflow when you compare the
output of sequential MATLAB code with that of the generated parallel C/C++ code. Therefore, when
there is possibility of such overflow, the code generator does not parallelize the
loop.
The table shows the MATLAB functions where significant overflow can occur, along with their corresponding workarounds.
MATLAB Function  Description  Workaround 

Integer overflow function out = integerOverflow(in) out = int8(0); for i = 1:numel(in) out = out + in(i); end end integerOverflow(int8(1:100)) ans = int8 127  Automatic parallelization of reduction based forloops
performing arithmetic operations on integers is not
supported when During parallel execution,
the reduction operations are distributed among multiple
threads. When the partial results are accumulated at the
end, the results might be
nondeterministic.
Therefore, the code generator do not automatically
parallelize the (126125) + 122 = 1 + 122 = 123 (126 + 122)  125 = 127(saturation)  125 = 2  If appropriate for
your application, disable the Saturate on integer
overflow
( 
Usage Notes and Limitations
for
loops containing calls to C/C++ functions usingcoder.ceval
are not automatically parallelized.Bitwise reduction operations (
bitand
,bitor
, andbitxor
) are only supported for integer data types.Custom reduction operations such as
a = foo(a,b)
are not supported for automatic parallelization offor
loops.Reduction operations on floatingpoint numbers are only approximately associative. To get deterministic behavior of a parallel execution, the reduction operations involved must be associative. To be associative, a function
f
must satisfy the following for alla
,b
, andc
.When working with floatingpoint numbers, different parallel executions of a loop might produce results with different roundoff errors. If such roundoff errors are unacceptable to your application, use the pragmaf(a,f(b,c)) = f(f(a,b),c)
coder.loop.parallelize('never')
to instruct the code generator to not automatically parallelize specific forloops. For more information on potential differences during code generation, see Differences Between Generated Code and MATLAB Code.