Implicit expansion with arrayfun (cpu vs gpu)

237 views (last 30 days)
Alessandro
Alessandro on 3 Jan 2026 at 16:22
Commented: Matt J on 7 Jan 2026 at 3:57
I find very convenient that Matlab allows for implicit expansion since the 2016 version (for an explanation, see this nice article: https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/?s_tid=blogs_rc_1).
I was then puzzled to discover that arrayfun on the cpu does not allow for it, while arrayfun when called with gpu arrays does allow for implicit expansion. Below is a MWE to demonstrate this behavior.
Let me quickly explain it: if I have two vectors x with size [2,1] and y with size [1,2], I can calculate the sum x+y and get a matrix 2*2 as intended. This is better than the ugly and more memory-intensive
repmat(x,1,2)+repmat(y,2,1)
Unfortunately this does not work with arrayfun on the cpu!
Since I code both using normal arrays and GPU arrays, I find this different behavior of arrayfun quite misleading. It would be great if Matlab could allow implicit expansion also on arrayfun cpu. When I have large arrays, duplicating dimensions with repmat takes a lot of memory.
%% Demonstration of implicit expansion support in MATLAB and arrayfun
% This script shows that:
% 1) MATLAB supports implicit expansion for standard array operations.
% 2) arrayfun on the GPU supports implicit expansion.
% 3) arrayfun on the CPU does NOT support implicit expansion!!
%
% Implicit expansion allows a 2×1 vector to be added to a 1×2 vector,
% producing a 2×2 matrix.
clear; clc; close all;
% Define test vectors
x = [1; 2]; % Column vector (2×1)
y = [1, 2]; % Row vector (1×2)
%% Implicit expansion using standard MATLAB operations
F1 = myadd(x, y);
%% Implicit expansion using arrayfun on the GPU
F2 = arrayfun(@myadd, gpuArray(x), gpuArray(y));
%% Attempt implicit expansion using arrayfun on the CPU (expected to fail)
try
F3 = arrayfun(@myadd, x, y);
catch ME
fprintf('CPU arrayfun error (expected):\n%s\n\n', ME.message);
end
%% Function myadd
function F = myadd(x, y)
% Element-wise addition
F = x + y;
end
  10 Comments
Joss Knight
Joss Knight on 5 Jan 2026 at 9:42
Moved: Matt J on 5 Jan 2026 at 12:40
arrayfun with expansion, particularly for expanding scalars, is certainly very convenient syntactic sugar for a for loop to make code more compact and readable, for instance, setting a bunch of settings on an object array, where the for loop is not going to be optimized so there is no advantage to it.
layers = arrayfun(@setHyperParams, layers, 0, [layers.L2Factor]); % Freeze learn rate
It's just easier to read, isn't it? Obviously there are other ways to do this particular operation on one line but I certainly see your point. However, the others have good points; arrayfun is almost always slower than a for loop or alternative approach, so taking action to encourage its use is something to do with caution.
Paul
Paul on 5 Jan 2026 at 12:30
Moved: Matt J on 5 Jan 2026 at 12:40
Hi Joss,
Any idea why the gpu version of arrayfun supports expansion (whereas the cpu version does not)?

Sign in to comment.

Accepted Answer

Matt J
Matt J on 4 Jan 2026 at 1:03
Edited: Matt J on 4 Jan 2026 at 20:22
Remember that arrayfun (on the CPU) is nothing more than an M-Code for-loop in disguise. Therefore, it is not hard to write your own arrayfun version -- one which supports implicit expansion -- using for-loops.
The implementation below is slower in speed to explicit expansion (using repmat), but of course conserves a lot more memory. You would probably need very significant data sizes and/or RAM limits before implicit expansion is faster.
% Define test vectors
N=100;
x = rand(N,1); % Column vector
y = rand(1,N); % Row vector
z = rand(1,1,N);
myadd=@(a,b,c) a+b+c;
tic;
out1=arrayfunImplExp(myadd,x,y,z); %implicit expansion
toc
Elapsed time is 0.898685 seconds.
tic;
out2=arrayfunExplExp(myadd,x,y,z); %explicit expansion (with repmat)
toc
Elapsed time is 0.548641 seconds.
isequal(out1,x+y+z)
ans = logical
1
isequal(out2,x+y+z)
ans = logical
1
function out = arrayfunImplExp(fun, varargin)
%ARRAYFUNIMPLEXP arrayfun with implicit expansion (CPU)
%
% NOTE:
% Useful for CPU-only execution. On the GPU, use arrayfun instead,
% which implements its own implicit expansion.
% Number of inputs and maximum dimensionality
numArgs = numel(varargin);
numDims = max(cellfun(@ndims, varargin));
% Collect per-input sizes (row-wise)
sizes = cellfun(@(c) size(c, 1:numDims), varargin, 'UniformOutput', false);
sizes = vertcat(sizes{:}); % [numArgs x numDims]
% Output size is max along each dimension
outSize = max(sizes, [], 1);
% Precompute row-wise strides for linear indexing
strides = [ ...
ones(numArgs, 1), ...
cumprod(sizes(:, 1:end-1), 2) ...
];
% Convert sizes to zero-based limits
sizes = sizes - 1;
% Allocate output
out = nan(outSize);
argsIdx=nan(numArgs,1);
args=cell(1,numArgs);
A=1:numArgs;
% Main loop over output elements
for linIdx = 1:numel(out)
% Convert linear index → zero-based subscripts
idx = linIdx - 1;
subs = zeros(1, numDims);
for d = 1:numDims
sd = outSize(d);
subs(d) = mod(idx, sd);
idx = floor(idx / sd);
end
% Apply implicit expansion masking
Subs = min(subs,sizes);
% Row-wise sub2ind
argsIdxNew = sum(Subs .* strides, 2) + 1;
map=(argsIdxNew ~= argsIdx);
% Gather arguments
% v=varargin(map);
% k=argsIdxNew(map);
% c=0;
% for j=map
% c=c+1;
% args{j} = v{j}(k(c));
% end
for j = 1:numArgs
if map(j)
args{j} = varargin{j}(argsIdxNew(j));
end
end
argsIdx=argsIdxNew;
% Evaluate function
out(linIdx) = fun(args{:});
end
end
function out = arrayfunExplExp(fun, varargin)
%ARRAYFUNEXPLEXP arrayfun with explicit expansion using repmat
% Provided for comparison with arrayfunImplExp.
numArgs = numel(varargin);
numDims = max(cellfun(@ndims, varargin));
% Collect sizes
sizes = cellfun(@(c) size(c, 1:numDims), varargin, 'UniformOutput', false);
sizes = vertcat(sizes{:});
% Output size
outSize = max(sizes, [], 1);
numOut = prod(outSize);
% Preallocate expanded argument cell array
args = cell(numArgs, numOut);
% Explicit expansion
for k = 1:numArgs
v = varargin{k};
vSz = size(v, 1:numDims);
reps = outSize ./ vSz;
% Expand and linearize
vk = repmat(num2cell(v), reps);
args(k, :) = vk(:).';
end
% Allocate output
out = nan(outSize);
% Evaluate function
for j = 1:numOut
out(j) = fun(args{:, j});
end
end
  17 Comments
Alessandro
Alessandro on 6 Jan 2026 at 16:44

Thanks for the explanation. But using parfor with threads instead of processes should reduce the overhead considerably. I will try later if I have time.

Matt J
Matt J about 4 hours ago
I made some improvements to my File Exchange submission, which now includes two versions, one which is RAM-conserving (arrayfunLowMem) and one which is liberal with RAM, but somewhat faster (arrayfunHighMem). Both are significantly faster than native arrayfun (revised timeComparisons.m script attached).
1) arrayfun on GPU
2) Vectorized on CPU
3) Nested loops on CPU
4a) User-written arrayfunLowMem on CPU
4b) User-written arrayfunHighMem on CPU
5) Matlab arrayfun on CPU
Maximum absolute error vs baseline (F2):
gpuArray.arrayfun (F1) : 7.10543e-15
Vectorized CPU (F2) : 0
Nested loops CPU (F3) : 0
arrayfunLowMem (F4a): 0
arrayfunHighMem (F4b): 0
Matlab arrayfun CPU (F5) : 0
Timing summary (seconds):
1) arrayfun GPU : 0.010418
2) Vectorized CPU : 0.097721
3) Nested loops CPU : 0.329859
4a) arrayfunLowMem : 3.806847
4b) arrayfunHighMem : 3.187784
5) arrayfun Matlab : 5.101973

Sign in to comment.

More Answers (3)

Torsten
Torsten on 3 Jan 2026 at 18:05
Edited: Torsten on 3 Jan 2026 at 18:29
F3 = cellfun(@myadd, [{x}], [{y}], 'UniformOutput', false)
will work.
Arrayfun tries to apply "myadd" to [first element of x,first element of y] and [second element of x, second element of y]. Thus x and y must be of the same size and - even if it would work - the result would be [2, 4] or [2;4].
I don't understand why it works for gpuArray and gives the result you want. Maybe input arrays of different size are automatically interpreted here as two separate single objects to which "myadd" is to be applied - as it is done if you use cell arrays together with "cellfun".
  4 Comments
Alessandro
Alessandro on 3 Jan 2026 at 21:36
@Paul: I agree with you. Ideally Matlab should implement implicit expansion also on arrayfun cpu, as it is implemented on the gpu version. This improvement would also not break any code.
Torsten
Torsten on 3 Jan 2026 at 22:57
Edited: Torsten on 3 Jan 2026 at 22:57
If your contribution rather was a complaint and not a question, make a feature request:

Sign in to comment.


Matt J
Matt J on 3 Jan 2026 at 23:25
Edited: Matt J on 4 Jan 2026 at 3:18
I agree it is confusing, but the gpuArray version of arrayfun was never intended as a direct analgoue of the CPU version. Additionally, there are all kinds of other instances where the gpuArray support for a function differs in capabilities from its CPU version. Usually, though, the GPU version is less flexible, not moreso, as seems to be the case with arrayfun.
A more appropriate comparison of implicit expansion between CPU vs GPU (for binary functions) would probably be to use bsxfun instead:
% Define test vectors
x = rand(10000,1); % Column vector
y = rand(1,10000); % Row vector
xg=gpuArray(x);
yg=gpuArray(y);
myadd=@(a,b) a+b;
timeit( @() bsxfun(myadd, x, y) ) %CPU
ans =
0.3355
gputimeit( @() bsxfun(myadd, xg, yg) ) %GPU
ans =
0.0078

Joss Knight
Joss Knight on 5 Jan 2026 at 13:15
Moved: Matt J on 5 Jan 2026 at 13:40
Well, there are a couple of answers to that. Firstly and probably most importantly, GPU arrayfun is just an extension of gpuArray's existing compiler for element-wise operations, all of which support dimension expansion; it's also a natural strided memory access pattern to support for a GPU kernel, where each input has its own stride, and there is native support for this kind of memory access in GPU hardware. Secondly, it makes design sense and only isn't implemented for CPU for the reasons given. The historical explanation is that the idea of broadcast operations didn't even exist back when arrayfun was first created for the CPU, but it did exist by the time GPU arrayfun came along.
  2 Comments
Matt J
Matt J on 5 Jan 2026 at 13:23
Edited: Matt J on 5 Jan 2026 at 13:46
Secondly, it makes design sense and only isn't implemented for CPU for the reasons given.
Given where? In your initial post?
And are those reasons still an argument for not pursuing it now, or is it just because of historical legacy?
Joss Knight
Joss Knight on 6 Jan 2026 at 10:00
The reasons given by various people. Should you really enhance something you want to discourage people from using?

Sign in to comment.

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!