Efficient searching to find the first element of an array meeting a condition

9 views (last 30 days)
Mohammad Shojaei Arani on 24 Aug 2023
Answered: C. Rithiya on 6 Sep 2023
Hello,
If a vector is given and the task is to find the first element fulfilling a condition then we use the
command 'finb(...,1)'. However, imagine that checking whether any member of my vector meets
the condition is going to be computationally expensive. In such a case the linear search is not a good idea.
An example:
f = @(x) rand(x);
find(f(1:10)>0.5);
Note that in this example it does not take so much time to check the 'condition' (i.e., wether it is bigger than 0.5) but in my actuall problem it is really expensive. So, my question is this: is there an efficient way to find the first element of an array fulfilling a condition?
Babak

Florian Bidaud on 24 Aug 2023
Edited: Florian Bidaud on 24 Aug 2023
EDIT: See @Bruno Luong answer, a simple for loop will actually be faster.
Depending on your data, you can split the search into pieces.
Let's say there the probability of your condition being met is 1/10.
Then there is a rather big probability that it is met in the first 10 values of your dataset, therefore you can split the search into vectors of 10 elements.
With your exemple, we could split the search into vectors of 2 because there is one chance out of two that the condition is met.
Provided that the matrix is already stored (Which I guess is the case for you ?), this could give something like the following code.
How to smartly split the data is the tricky part, it is supposing you already have a feeling about the probability of your condition being met.
a = f(1:12);
tic
find(a>0.5,1)
toc
ans =
1
Elapsed time is 0.425812 seconds.
tic
p = 2; % 1/Probability
i = 0;
while true
b = a(i*p+1:(i+1)*p);
I = find(b>0.5,1);
if ~isempty(I)>0
break
end
i = i+1;
end
value = i*p+I
toc
value =
1
Elapsed time is 0.027613 seconds.
Now if we reduce the probability :
tic
find(a>0.999,1)
toc
ans =
1550
Elapsed time is 0.396268 seconds.
tic
p = 1000; % 1/Probability
i = 0;
while true
b = a(i*p+1:(i+1)*p);
I = find(b>0.999,1);
if ~isempty(I)>0
break
end
i = i+1;
end
value = i*p+I
toc
value =
1550
Elapsed time is 0.026435 seconds.
Or even more:
tic
find(a>0.99999,1)
toc
ans =
232137
Elapsed time is 0.387584 seconds.
tic
p = 100000; % 1/Probability
i = 0;
while true
b = a(i*p+1:(i+1)*p);
I = find(b>0.99999,1);
if ~isempty(I)>0
break
end
i = i+1;
end
value = i*p+I
toc
value =
232137
Elapsed time is 0.026697 seconds.
Mohammad Shojaei Arani on 24 Aug 2023
Hello Florian,
Thanks for your kind response. Well, in my problem I do not know the probability of meeting the
condition, unfortunately. But you answer is helpful for me and hoefully I find a way. First, I thought to do it using a sort of binary search but unfortunately it is not possible.
If I simplify my problem, then I have a vector of 0's and 1's where it is computationally expensive to say which element is 1 or 0. So, the question is to find the first '1' (when the condition is met) in this array. Aparently the find command follows a linear searching strategy which is expensive for my problem.
Thanks again!

Bruno Luong on 24 Aug 2023
Edited: Bruno Luong on 24 Aug 2023
just fo the obvious for loop
for i = 1:length(x)
if expensive_check_for_meet_condition(x(i))
ifind = i;
break
end
end
for-loop the most fundamental statement in any programming language often neglected by MATLAB users
Bruno Luong on 24 Aug 2023
Edited: Bruno Luong on 24 Aug 2023
If you know how to sort the array so that the condition meet becomes stronger from start to the end, meaning the expensive_check_for_meet_condition(x) always returns 0s followed by the 1s, a dichotomy (binary) strategy search can be adopted. You cut in half the array check the middle, depending on the test value, you skip the left or right side until your interval reduces to 1 element.
Mohammad Shojaei Arani on 25 Aug 2023
Unfortunately, it is not often the case that my set comes with 0's first and then 1's apear. So, unfortunately, I am not able to perform a binary serch.

Torsten on 24 Aug 2023
Edited: Torsten on 24 Aug 2023
"find" will use the "first" option by default which means that the command will return the first occurence of the event and will not search further.
Florian Bidaud on 24 Aug 2023
Whislt it's true, the find function will still have to deal with the whole array, consuming unnecessary time if the condition is met early.
Mohammad Shojaei Arani on 24 Aug 2023
Yes Florian,
This is what I was guessing as well. The 'find' command 'NEEDS' to know the set it wants to search within first, unfortunately.

C. Rithiya on 6 Sep 2023
a = f(1:12); tic find(a>0.5,1) toc ans = 1 Elapsed time is 0.425812 seconds. tic p = 2; % 1/Probability i = 0; while true b = a(i*p+1:(i+1)*p); I = find(b>0.5,1); if ~isempty(I)>0 break end i = i+1; end value = i*p+I toc value = 1 Elapsed time is 0.027613 seconds.