Is there a way to obtain desired index without using 'find'?

Dear all,
Suppose there is some array
ar=[102 243 453 768 897 ...]
Is there any way to obtain indices of an element having value e.g. 243 without using find?
For each element I need to find its index, so the computational cost will be O(N^2), which is too much. One possibility is using sparse matrices, e.g.
smatr=sparse(ar,1,1:length(ar))
and then index can be retrieved simply as ind=smatr(243) and so on, reducing computational cost to O(N). However, the problem is that in my computations values of 'ar' might exceed maximum size of matrix. So is there any additional way to relate values of 'ar' to indices so the latter can be obtained in one operation?
Thanks,
Dima

9 Comments

so the computational cost will be O(N^2)
Where did you get that from?
I have N steps ('ar' length is also N), and for each step I obtain some value in the 'ar' and need to find its index. Assuming 'find' cost is O(N), I get O(N^2) for the whole procedure. Or my assumption is incorrect?
One way I thought is to first sort 'ar' (obtain 'sar'), and, since 'ar' has unique elements (forgot to tell), write own searching scheme like: I obtain value 'val' of which I need to find index in 'sar', I compare it with middle value of sar: if it is lower, I search next within a lower half of 'sar', and if higher - in upper half. Recursively doing this procedure, searching cost becomes O(log_2 N).
Anybody knows, does MatLab also do something like this (and what functions use this)? Or it compares all values?
HISTC does this (I elaborate on this in my answer below). Maybe also FMINSEARCH.
Okay, so after some tests I ended out with the recursive search algorithm:
val=243; %value index of which to be found
[uar,ia,ib]=unique(ar);
sind=1; lind=length(uar);
while lind-sind>0,
mind=ceil((sind+lind)/2);
if val>=uar(mind), sind=mind; else, lind=mind-1; end
end
ind=ia(lind);
This routine greatly outperforms all other (in my calculations it was always almost independent of N, as well as faster, while all others linearly grow with N). So, I think, the problem is sorted. Thank you all for your answers, I am really glad I received help here and I greatly appreciate it!
Dima, what does array ar look like? I mean is it a row vector of double values? What is the number of elements in ar?
Yes, it is a row vector of double values, and the number of elements can be different (it is the same as the number of points in time-series).
This routine greatly outperforms all other (in my calculations it was always almost independent of N, as well as faster, while all others linearly grow with N). So, I think, the problem is sorted. Thank you all for your answers, I am really glad I received help here and I greatly appreciate it!
Maybe, but just so you know, there are mex-driven versions of this available on the FEX
See also my 3rd Answer below for how to incorporate it.
Thanks! There is indeed mex-version!

Sign in to comment.

 Accepted Answer

Here's another approach that uses the find_idx function from the FEX
It seems to overcome the O(N) overhead of HISTC and is virtually independent of N in speed,
ar=[102 243 453 768 897 102];
[sar,i,j]=unique(ar);
ind=j(floor(find_idx(768,sar))), %search for 768 for example

More Answers (7)

ar=[102 243 453 768 897 243 653 23];
KEY = 1:length(ar);
KEY(ar==243)

2 Comments

Each of the == operations in your code will take N comparisons. Dmytro also wants to do this for each of the N elements, for a total of N*N = O(N^2) operations. Dmytro hopes to do it more efficiently.
This is the quickest way to do it in MATLAB, that I know of anyway! Not, of course, counting a customized MEX function.

Sign in to comment.

Here's a modification of the sparse matrix approach that might ease the difficulty with maximum matrix sizes limits,
armax=max(ar);
armin=min(ar);
arc=ar-armin+1;
arcmax=armax-armin+1;
N=ceil(sqrt(arcmax));
[I,J]=ind2sub([N,N], arc);
smatr=sparse(I,J,1:length(arc),N,N);
Now, whenever you need to look up an index, e.g. 243, you would do
[i,j]=ind2sub([N,N],243-armin+1);
ind=smatr(i,j);
The difference is that now the matrix dimensions of smatr are roughly the square root of what they were in your approach. Even less, actually, since we offset the ar values by armin in its construction.

2 Comments

Thank you, that's a good idea! Unfortunately, values might be larger than 2^96 (square of maximum array size) for large time-series. So for now probably the best approach will be to write and use recursive O(log N)-cost search at each step, as I wrote in the comment under the question.
2^96 cannot be handled as array indexes for MATLAB. The limit is 2^48 if I recall correctly. 2^96 exceeds 64 bit representation.

Sign in to comment.

Here is an implementation of your O(log_2(N)) idea using HISTC
ar=[102 243 453 768 897 102];
[sar,i,j]=unique(ar);
[~,bin]=histc(768,sar); %search for 768 for example
ind=j(bin);

5 Comments

Great idea, thank you! It works quite fast. However, for some reason it seems that this algorithm has O(N) cost, at least according to my simulations with different N. I will write direct recursive search and compare afterwards.
The algorithmic order of histc() is not documented, and might depend upon the length of the input.
I can't believe HISTC would be doing an O(N) search. Why would it do so, when there are well-known O(logN) alternatives???
Dima, are you sure you're measuring only HISTC and not the overhead for the call to UNIQUE?
Yes, I am sure, the code is
css=zeros(500,1); NN=1000*(1:500);
for cn=1:500
RM=unique(randi(NN,NN,1));
aaa=cputime;
for kn=1:10000, nn=randi(NN); [~,bb]=histc(nn,RM); end
css(cn)=cputime-aaa;
end
plot(NN,css,'o');
OK, through the Newsgroup, I think it's been identified why HISTC is O(N). It's because it does a pre-check the monotonicity of the RM values in
[~,bb]=histc(nn,RM);
The search itself is O(log(N)). See also

Sign in to comment.

If each element is present only once, consider using the three-output form of unique()

5 Comments

Thank you for your answer. However, I did not understand what do you mean. Suppose I have used unique:
[uqar,ia,ib]=unique(ar);
Then I receive at some stage of my algorithm that I need index of element with value say 243. How can I obtain it then using (uqar,ia,ib)?
If each element is present only once, this seems equivalent to just doing
ib=1:length(ar);
ib(ar==243)
and will be O(N^2) again if this needs to be done for every element value.
I think it is not equivalent, because UNIQUE sorts first for ar of any real length (>1000). Time-wise this all adds up to more of it!
Sure. I only meant that the output is the same and that unique() doesn't really accomplish anything here, seeing as how the elements are already unique.

Sign in to comment.

you can use the following function :
function posX = findPosition(x,y)
posX=[];
for i =1:length(x)
p=x-y;
isequal(p(i),0)
if ans==1
posX=[posX,i]
end
end

2 Comments

That would print out the result of each isequal() . Better would be to use
function posX = findPosition(x,y)
posX=[];
for i = 1:length(x)
p = x - y;
if isequal(p(i),0)
posX=[posX,i]
end
end
Which optimizes to
function posX = findPosition(x,y)
posX=[];
p = x - y;
for i = 1:length(x)
if isequal(p(i),0)
posX=[posX,i]
end
end
But then that could be simplified further as
function posX = findPosition(x,y)
posX=[];
for i = 1:length(x)
if x(i) == y
posX=[posX,i]
end
end
However, this requires continually expanding posX, which could be expensive. Less expensive would be:
function posX = findPosition(x,y)
posX = 1 : length(x);
posX = posX( x == y );
On the other hand, this is an order N calculation anyhow, and since the indices have to be found for each entry in the array, using this repeatedly would still end up being O(N^2) which the poster was trying to avoid.
yeah. u are awesome. I didnt go thriugh the optimization part. thnk you

Sign in to comment.

myMap = containers.Map( ar, 1:length(ar) )
This uses a hash structure. I am not sure which one it uses, so I do not know the creation time costs; certainly no less than O(N) as the 1:length(ar) is O(N). O(N*log(N)) I suspect.
Once hashed, each lookup is nominally constant time (I do not know if the hash structure it uses can have collisions that might need to be resolved.)
for k = 1:length(ar)
if ar(k)==243 % Using your example of the 243 element
posX=k;
break
else
continue
end
end

Asked:

on 1 Oct 2012

Edited:

on 12 Nov 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!