How to speed up pairwise difference calculation between vectors under certain condition?

Asked by Raouf Amara

Raouf Amara (view profile)

on 26 May 2019
Latest activity Answered by Jan

Jan (view profile)

on 8 Jun 2019
Accepted Answer by Jan

Jan (view profile)

I have two double arrays A (size: Nx1) and B (Mx1) both with large increasing values (from ~1e9 to ~1e13; not sure this is relevant).
I am trying to compute the pairwise differences between the two vectors but only for pairs that respect a certain condition, that is the absolute value of the difference between the two values is smaller than t.
I first tried to brute force calculate all the pairwise differences but as you would expect a NxM array cannot be stored on a regular computer; it is around a few hundred of terabytes.
So far I have implemented a rather straightforward solution which works fine with small N and M.
Low = 1e9; % lower bound for values in A & B
Upp = 1e13; % upper bound for values in A & B
N = 1e4; % number of elements in A and B (assuming they are equal, which is not the case)
A = sort((Upp-Low)*rand(N,1)+Low,'ascend');
B = sort((Upp-Low)*rand(N,1)+Low,'ascend');
maxdif = 1000000; % maximum difference
PrWD = cell(1,N);
tic
for i = 1:N
% pairwise difference if |B-A(i)|<=maxdif
PrWD{i} = A(i) - B(abs(B-A(i))<=maxdif)';
end
toc
However, I have A and B arrays which are both very large (N and M is the order of a few tens of millions) such that this operation takes a few hours in practice.
I tried arrayfun or bsxfun on small subsets (N=M=1e5) but it does not speed up the process.
At this point, I understand that this is extremely demanding as calculating all the pairwise differences or even the indices verifying the condition seem to scale with N*M.
Is there any (Matlab) way to speed up this process? Maybe taking into acccount the fact that values are increasing, or that not all differences need to be calculated?

Jan

Jan (view profile)

on 5 Jun 2019
You can improve the abs(B-A(i))<=maxdif part by locating the lower and upper indices iteratively. I will provide some code soon.
Dummy comment just to let me find this thread again: Jans magic message

R2019a

Jan (view profile)

on 8 Jun 2019

Instead of creating the huge index matrix abs(B-A(i))<=maxdif in each iteration, it is cheaper to search the lower and upper limits of the matching elements iteratively:
tic
PrWD2 = cell(1,N);
ini = 1;
fin = ini;
for i = 1:N
% Find initial and final index:
while B(ini) < A(i) - maxdif && ini < N
ini = ini + 1;
end
while B(fin) <= A(i) + maxdif && fin < N
fin = fin + 1;
end
if ini <= fin - 1
PrWD2{i} = A(i) - B(ini:fin - 1).';
end
end
toc
% PrWD2(cellfun('isempty', PrWD2)) = {zeros(1,0)};
% isequal(PrWD, PrWD2)
With your example and N=1e5 this is about 1100 times faster than the original code.
This code replies the empty matrix [], if no element is matching, while the original code creates a [1 x 0] matrix. For a comparison of the results, the empty matrices are adjusted, but this is not needed in the productive code most likely.