How can I rapidly filter a sorted vector for elements separated by a minimum value, processing one-way from the first value?

Question

Daniel on 16 Feb 2017

3
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/325375-how-can-i-rapidly-filter-a-sorted-vector-for-elements-separated-by-a-minimum-value-processing-one-w

Edited: Jan on 20 Feb 2017

Using MATLAB R2012b, I have a (large) sorted vector of elements. Starting from the initial value, I need to discard all elements that are within some particular range of any previous undiscarded element. Simply looking for any set with the given minimum separation will not give the unique solution I require.

(Slow, but obvious) MWE:

function vec2 = testfun(vec,threshold)
  idx = 0.1;
  vec2 = vec;
  while not(isempty(idx));
      testdiff = diff(vec2);
      idx = find(testdiff<threshold,1);
      vec2(idx+1) = [];
  end
end

When I call this on

vec = [1,13,27,30,33,39,53,54,100];
threshold = 25;

I should only retain:

vec2 == [1,27,53,100]

but calling

testfun(vec,10)

should give me

vec2 = [1, 13, 27, 39, 53, 100]

When I call it with a script to randomly generate larger vectors:

numval = 1e5;
threshold = 25;
vec = sort(rand(numval,1).*numval*1e2);
vec2 = testfun(vec,threshold);

the profiler indicates the following breakdown within the function:

vec2(idx+1) = [];             66%
dx = find(vecdiff<thresho...  20%
vecdiff = diff(vec2);         12%

Clearly, the re-indexing and allocation of vec2 after deleting elements is slow. Yet, if I don't take those elements out (e.g., I could set them = 0 instead of []), I have to find a way to ignore negative and zero elements in the diff output, but continue to search for results below the threshold value from the previous retained element. The total runtime in the function appears to behave quadratically, although the idx = find() line begins to take up more of the total time as the length of the vector increases (517s, 46/40/13% for numval = 4e5).

2 Comments
Show NoneHide None

Stephen23 on 17 Feb 2017

+1 for a clearly written question with a MWE. I wish there were more of these!

Jan on 17 Feb 2017

@Stephen: You are right. Code formatting, test data, profiler used already, explained what has been tried already, reaction to asnwers beyond "does not work". +1

Sign in to comment.

Sign in to answer this question.

Answer 1

Jan on 16 Feb 2017

2
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/325375-how-can-i-rapidly-filter-a-sorted-vector-for-elements-separated-by-a-minimum-value-processing-one-w#answer_255066

Edited: Jan on 17 Feb 2017

Open in MATLAB Online

function vec = testfun2(vec, threshold)
idx  = 1;
n    = length(vec);
keep = false(1, n);
while ~isempty(idx)
  keep(idx) = true;
  iThresh   = vec(idx) + threshold;
  idx       = find(vec > iThresh, 1);  % [EDITED]
end
vec = vec(keep);

[EDITED, 2017-02-17 08:24 UTC]:

A version which runs in linear time:

function vec = testfun3(vec, threshold)
n    = length(vec);
keep = false(1, n);
lim  = vec(1) + threshold;
keep(1) = true;
for k = 2:n
   if vec(k) > lim
      keep(k) = true;
      lim     = vec(k) + threshold;
   end
end
vec = vec(keep);

Timings under R2016b/64:

numval = 1e5;
threshold = 25;
vec = sort(rand(numval,1).*numval*1e2);
tic; r1 = testfun(vec,  threshold); toc   % Original
tic; r2 = testfun2(vec, threshold); toc   % FIND(vec > iThresh)
tic; r3 = testfun3(vec, threshold); toc   % Dull loop
isequal(r1, r2, r3)
 Elapsed time is 44.117896 seconds.
 Elapsed time is 10.090938 seconds.
 Elapsed time is 0.003627 seconds.
 1  % Results are equal

Whoooops.

Could you please cross-check this? A speedup of factor 12000 usually means, that something important has been overseen.

5 Comments
Show 3 older commentsHide 3 older comments

Daniel on 17 Feb 2017

Open in MATLAB Online

Double-checking all four techniques (OP, your two, and Walter's) with tic/toc, I get close to your values.

 numval = 5e4;
threshold = 25;
vec = sort(rand(numval,1).*numval*1e2);
tic; r1 = testfun(vec,  threshold); toc   % Original
tic; r2 = testfunJD(vec, threshold); toc   % FIND(vec > iThresh)
tic; r3 = testfunJ3(vec, threshold); toc   % Dull loop
tic; r4 = testfunW(vec, threshold); toc   % Dull loop
isequal(r1, r2, r3, r4)
Elapsed time is 2.433664 seconds.
Elapsed time is 2.977988 seconds.
Elapsed time is 0.001315 seconds.
Elapsed time is 0.004967 seconds.
ans = 1    % same results

Cross-check confirms a very decent speed-up of about 3000 times for numval=1e5. I've been taking all the times from the profiler to date; on that, Walter's method is fastest. Using tic/toc (and I've just read http://au.mathworks.com/matlabcentral/newsreader/view_thread/298325 on that), yours comes out in front.

With tic/toc:

 clear all
numval = 1e5;
threshold = 25;
vec = sort(rand(numval,1).*numval*1e2);
tic; r1 = testfun(vec,  threshold); toc   % Original
tic; r2 = testfunJ2(vec, threshold); toc   % FIND(vec > iThresh)
tic; r3 = testfunJ3(vec, threshold); toc   % Dull loop
tic; r4 = testfunW(vec, threshold); toc   % Walter's suggestion
isequal(r1, r2, r3, r4)
Elapsed time is 11.469466 seconds.
Elapsed time is 11.828806 seconds.
Elapsed time is 0.003649 seconds.
Elapsed time is 0.009974 seconds.
ans =     1 %Results are equal

With profiler:

 clear all
testscript2    % calls functions in same order
Elapsed time is 11.967498 seconds.
Elapsed time is 12.729608 seconds.
Elapsed time is 0.991819 seconds.
Elapsed time is 0.959312 seconds.
ans =
     1

A direct comparison on numval = 1e8 using tic/toc gives:

 testfun3: 3.12 s     % Jan's 
 testfun: 11.31 s     % Walter's

Jan on 17 Feb 2017

The profiler disables the JIT acceleration, because the JIT can Change the order of commands to optimize the code and this would confuse the profiling. This is very sad, because code which profits from the JIT cannot be profiled accurately such that further optimizations require primitive tic/toc's instead.

Jan on 20 Feb 2017

Edited: Jan on 20 Feb 2017

@Daniel: For large data, the line vec = vec(keep) can be accelerated using FEX: CopyMask. When the complete function impelemented in C, the runtime for a 1e8 input decreased from 3.3 to 1.1 sec on my old Core2Duo. Do you have a C-compiler?

Sign in to comment.

Answer 2

Walter Roberson on 17 Feb 2017

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/325375-how-can-i-rapidly-filter-a-sorted-vector-for-elements-separated-by-a-minimum-value-processing-one-w#answer_255157

Open in MATLAB Online

You can run in linear time.

function vec2 = testfun(vec, threshold)
  vec2 = vec;                   %duplicate vector
  target = vec(1) + threshold;
  used = 1;
  for K = 2 : length(vec)
    if vec(K) < threshold; next; end
    used = used + 1;
    vec2(used) = vec(K);
    target = vec(K) + threshold;
  end
  vec2(used+1:end) = [];

Here I interpreted "within some particular range" as being exclusive -- that, for example, 5.0 would not be "within" 1.0 of 4.0. If that should be inclusive then change from < to <=

vec2 starts as a duplicate of vec1 so that it has the same data type and size -- and not having to copy over vec(1) is a bonus.

Items are copied into positions in vec2 as needed. Only a portion of vec2 is logically "in use" at any one time. At the end, the unused positions are discarded. This arrangement requires only a single memory allocation at the beginning, and then throwing away the trailing bit of it at the end (I'm not sure if that requires copying it.)

1 Comment
Show -1 older commentsHide -1 older comments

Daniel on 17 Feb 2017

Edited: Daniel on 17 Feb 2017

Open in MATLAB Online

Fantastic, Walter! This works quite well, with one fix:

if vec(K) < target; continue; end

Thank you very much. I'm not worried about the inclusivity, since the chance of it occurring is minuscule - but it was good of you to point out that ambiguity. I can now get through a vector with 1e8 elements in slightly less time than 5e5 took before.

[EDIT] That was using the profiler to time the run. Using tic/toc, I get through in the same time as 1e5 used to take.

Sign in to comment.

How can I rapidly filter a sorted vector for elements separated by a minimum value, processing one-way from the first value?

2 Comments
Show NoneHide None

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

How can I rapidly filter a sorted vector for elements separated by a minimum value, processing one-way from the first value?

2 Comments Show NoneHide None

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

5 Comments
Show 3 older commentsHide 3 older comments

1 Comment
Show -1 older commentsHide -1 older comments