Implicit expansion with arrayfun (cpu vs gpu)
237 views (last 30 days)
Show older comments
Alessandro
on 3 Jan 2026 at 16:22
Commented: Matt J
on 7 Jan 2026 at 3:57
I find very convenient that Matlab allows for implicit expansion since the 2016 version (for an explanation, see this nice article: https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/?s_tid=blogs_rc_1).
I was then puzzled to discover that arrayfun on the cpu does not allow for it, while arrayfun when called with gpu arrays does allow for implicit expansion. Below is a MWE to demonstrate this behavior.
Let me quickly explain it: if I have two vectors x with size [2,1] and y with size [1,2], I can calculate the sum x+y and get a matrix 2*2 as intended. This is better than the ugly and more memory-intensive
repmat(x,1,2)+repmat(y,2,1)
Unfortunately this does not work with arrayfun on the cpu!
Since I code both using normal arrays and GPU arrays, I find this different behavior of arrayfun quite misleading. It would be great if Matlab could allow implicit expansion also on arrayfun cpu. When I have large arrays, duplicating dimensions with repmat takes a lot of memory.
%% Demonstration of implicit expansion support in MATLAB and arrayfun
% This script shows that:
% 1) MATLAB supports implicit expansion for standard array operations.
% 2) arrayfun on the GPU supports implicit expansion.
% 3) arrayfun on the CPU does NOT support implicit expansion!!
%
% Implicit expansion allows a 2×1 vector to be added to a 1×2 vector,
% producing a 2×2 matrix.
clear; clc; close all;
% Define test vectors
x = [1; 2]; % Column vector (2×1)
y = [1, 2]; % Row vector (1×2)
%% Implicit expansion using standard MATLAB operations
F1 = myadd(x, y);
%% Implicit expansion using arrayfun on the GPU
F2 = arrayfun(@myadd, gpuArray(x), gpuArray(y));
%% Attempt implicit expansion using arrayfun on the CPU (expected to fail)
try
F3 = arrayfun(@myadd, x, y);
catch ME
fprintf('CPU arrayfun error (expected):\n%s\n\n', ME.message);
end
%% Function myadd
function F = myadd(x, y)
% Element-wise addition
F = x + y;
end
10 Comments
Joss Knight
on 5 Jan 2026 at 9:42
Moved: Matt J
on 5 Jan 2026 at 12:40
arrayfun with expansion, particularly for expanding scalars, is certainly very convenient syntactic sugar for a for loop to make code more compact and readable, for instance, setting a bunch of settings on an object array, where the for loop is not going to be optimized so there is no advantage to it.
layers = arrayfun(@setHyperParams, layers, 0, [layers.L2Factor]); % Freeze learn rate
It's just easier to read, isn't it? Obviously there are other ways to do this particular operation on one line but I certainly see your point. However, the others have good points; arrayfun is almost always slower than a for loop or alternative approach, so taking action to encourage its use is something to do with caution.
Accepted Answer
Matt J
on 4 Jan 2026 at 1:03
Edited: Matt J
on 4 Jan 2026 at 20:22
Remember that arrayfun (on the CPU) is nothing more than an M-Code for-loop in disguise. Therefore, it is not hard to write your own arrayfun version -- one which supports implicit expansion -- using for-loops.
The implementation below is slower in speed to explicit expansion (using repmat), but of course conserves a lot more memory. You would probably need very significant data sizes and/or RAM limits before implicit expansion is faster.
% Define test vectors
N=100;
x = rand(N,1); % Column vector
y = rand(1,N); % Row vector
z = rand(1,1,N);
myadd=@(a,b,c) a+b+c;
tic;
out1=arrayfunImplExp(myadd,x,y,z); %implicit expansion
toc
tic;
out2=arrayfunExplExp(myadd,x,y,z); %explicit expansion (with repmat)
toc
isequal(out1,x+y+z)
isequal(out2,x+y+z)
function out = arrayfunImplExp(fun, varargin)
%ARRAYFUNIMPLEXP arrayfun with implicit expansion (CPU)
%
% NOTE:
% Useful for CPU-only execution. On the GPU, use arrayfun instead,
% which implements its own implicit expansion.
% Number of inputs and maximum dimensionality
numArgs = numel(varargin);
numDims = max(cellfun(@ndims, varargin));
% Collect per-input sizes (row-wise)
sizes = cellfun(@(c) size(c, 1:numDims), varargin, 'UniformOutput', false);
sizes = vertcat(sizes{:}); % [numArgs x numDims]
% Output size is max along each dimension
outSize = max(sizes, [], 1);
% Precompute row-wise strides for linear indexing
strides = [ ...
ones(numArgs, 1), ...
cumprod(sizes(:, 1:end-1), 2) ...
];
% Convert sizes to zero-based limits
sizes = sizes - 1;
% Allocate output
out = nan(outSize);
argsIdx=nan(numArgs,1);
args=cell(1,numArgs);
A=1:numArgs;
% Main loop over output elements
for linIdx = 1:numel(out)
% Convert linear index → zero-based subscripts
idx = linIdx - 1;
subs = zeros(1, numDims);
for d = 1:numDims
sd = outSize(d);
subs(d) = mod(idx, sd);
idx = floor(idx / sd);
end
% Apply implicit expansion masking
Subs = min(subs,sizes);
% Row-wise sub2ind
argsIdxNew = sum(Subs .* strides, 2) + 1;
map=(argsIdxNew ~= argsIdx);
% Gather arguments
% v=varargin(map);
% k=argsIdxNew(map);
% c=0;
% for j=map
% c=c+1;
% args{j} = v{j}(k(c));
% end
for j = 1:numArgs
if map(j)
args{j} = varargin{j}(argsIdxNew(j));
end
end
argsIdx=argsIdxNew;
% Evaluate function
out(linIdx) = fun(args{:});
end
end
function out = arrayfunExplExp(fun, varargin)
%ARRAYFUNEXPLEXP arrayfun with explicit expansion using repmat
% Provided for comparison with arrayfunImplExp.
numArgs = numel(varargin);
numDims = max(cellfun(@ndims, varargin));
% Collect sizes
sizes = cellfun(@(c) size(c, 1:numDims), varargin, 'UniformOutput', false);
sizes = vertcat(sizes{:});
% Output size
outSize = max(sizes, [], 1);
numOut = prod(outSize);
% Preallocate expanded argument cell array
args = cell(numArgs, numOut);
% Explicit expansion
for k = 1:numArgs
v = varargin{k};
vSz = size(v, 1:numDims);
reps = outSize ./ vSz;
% Expand and linearize
vk = repmat(num2cell(v), reps);
args(k, :) = vk(:).';
end
% Allocate output
out = nan(outSize);
% Evaluate function
for j = 1:numOut
out(j) = fun(args{:, j});
end
end
17 Comments
Matt J
about 4 hours ago
I made some improvements to my File Exchange submission, which now includes two versions, one which is RAM-conserving (arrayfunLowMem) and one which is liberal with RAM, but somewhat faster (arrayfunHighMem). Both are significantly faster than native arrayfun (revised timeComparisons.m script attached).
1) arrayfun on GPU
2) Vectorized on CPU
3) Nested loops on CPU
4a) User-written arrayfunLowMem on CPU
4b) User-written arrayfunHighMem on CPU
5) Matlab arrayfun on CPU
Maximum absolute error vs baseline (F2):
gpuArray.arrayfun (F1) : 7.10543e-15
Vectorized CPU (F2) : 0
Nested loops CPU (F3) : 0
arrayfunLowMem (F4a): 0
arrayfunHighMem (F4b): 0
Matlab arrayfun CPU (F5) : 0
Timing summary (seconds):
1) arrayfun GPU : 0.010418
2) Vectorized CPU : 0.097721
3) Nested loops CPU : 0.329859
4a) arrayfunLowMem : 3.806847
4b) arrayfunHighMem : 3.187784
5) arrayfun Matlab : 5.101973
More Answers (3)
Torsten
on 3 Jan 2026 at 18:05
Edited: Torsten
on 3 Jan 2026 at 18:29
F3 = cellfun(@myadd, [{x}], [{y}], 'UniformOutput', false)
will work.
Arrayfun tries to apply "myadd" to [first element of x,first element of y] and [second element of x, second element of y]. Thus x and y must be of the same size and - even if it would work - the result would be [2, 4] or [2;4].
I don't understand why it works for gpuArray and gives the result you want. Maybe input arrays of different size are automatically interpreted here as two separate single objects to which "myadd" is to be applied - as it is done if you use cell arrays together with "cellfun".
Matt J
on 3 Jan 2026 at 23:25
Edited: Matt J
on 4 Jan 2026 at 3:18
I agree it is confusing, but the gpuArray version of arrayfun was never intended as a direct analgoue of the CPU version. Additionally, there are all kinds of other instances where the gpuArray support for a function differs in capabilities from its CPU version. Usually, though, the GPU version is less flexible, not moreso, as seems to be the case with arrayfun.
A more appropriate comparison of implicit expansion between CPU vs GPU (for binary functions) would probably be to use bsxfun instead:
% Define test vectors
x = rand(10000,1); % Column vector
y = rand(1,10000); % Row vector
xg=gpuArray(x);
yg=gpuArray(y);
myadd=@(a,b) a+b;
timeit( @() bsxfun(myadd, x, y) ) %CPU
ans =
0.3355
gputimeit( @() bsxfun(myadd, xg, yg) ) %GPU
ans =
0.0078
0 Comments
Joss Knight
on 5 Jan 2026 at 13:15
Moved: Matt J
on 5 Jan 2026 at 13:40
Well, there are a couple of answers to that. Firstly and probably most importantly, GPU arrayfun is just an extension of gpuArray's existing compiler for element-wise operations, all of which support dimension expansion; it's also a natural strided memory access pattern to support for a GPU kernel, where each input has its own stride, and there is native support for this kind of memory access in GPU hardware. Secondly, it makes design sense and only isn't implemented for CPU for the reasons given. The historical explanation is that the idea of broadcast operations didn't even exist back when arrayfun was first created for the CPU, but it did exist by the time GPU arrayfun came along.
2 Comments
Joss Knight
on 6 Jan 2026 at 10:00
The reasons given by various people. Should you really enhance something you want to discourage people from using?
See Also
Categories
Find more on Matrices and Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!