How to use isoutlier based in a part of the data?

9 views (last 30 days)
Good morning everybody,
I have a vector of datas. Like this,
a =[0;0.0028;0.0002;0.0039;0.0061].
As you see, since the 4° element, the values start growing more until the end.
I was trying to determine a threshold to define the 4° and 5° elements as ouliers using 'isoutlier' function from Matlab. I did it. But I had to define a fixed 'ThresholdFactor'value using one of the methods the function has.
I would like the 4° and 5° values to being identified as outliers. Not based with all the vector datas, but because they are bigger than the 1°, 2° and 3° elements. I mean, I would like to find the outliers based on the backforward datas [0;0.0028;0.0002].
The vector I posted is an example. The size must be generic.
Can you help me?
P.S. (Actualized): As I said, depending of the data entries, my vectors gonna have different sizes. But in all cases, the phenomenum they represent, makes the vector values would be bigger at the end.
I can't find a way to define when the datas gonna be outliers since the vector will not always be the same. I need to generalize. So what I really need is to identify when the values start growing until reach the end. For instance, for my example, it would happen from the 4° position.
I hope I could explain better here.
  5 Comments
Mariana
Mariana on 7 Mar 2023
Thank you, Antonios
I'm gonna try it and back here to comment how it goes.
Mariana
Mariana on 7 Mar 2023
Edited: Mariana on 7 Mar 2023
The method does not work for the following vector
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113].
But it helped me to solve other problems.
Thanks a lot.

Sign in to comment.

Accepted Answer

Mathieu NOE
Mathieu NOE on 7 Mar 2023
hello
why not using islocalmin ? seems to me what you want is to keep the first 3 points (corresponding to a local min)
a =[0;0.0028;0.0002;0.0039;0.0061];
id = find(islocalmin(a(1:end)));
a_keep = a(1:id)
a_keep = 3×1
0 0.0028 0.0002
plot(a)
hold on
plot(a_keep,'dr')
  2 Comments
Mariana
Mariana on 7 Mar 2023
Edited: Mariana on 7 Mar 2023
This worked.
I tried with another vector,
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=islocalmin(c);
resul=[c d] % I can see the local min. I'm interested in the last local min.
d=find(d); % I get the local min positions
d=d(end); % Getting the last local min position
threshold=max(c(1:d)); % Is the threshold i was looking for in general way
Thank you very much all of you.

Sign in to comment.

More Answers (4)

Antonios Dougalis
Antonios Dougalis on 7 Mar 2023
Hi,
I am not sure if i got it right. You can simply index in the region you are intersted in your array 'a' when using isoutlier
A = [1:100] % make example array A
A(5) = 1000; % put at 5th index the value 1000
A(50) = 1000; % put at 50th index the value 1000
B = isoutlier(A); % will return both outliers in logical array at positions 5 to 50
C = isoutlier(A(1:10)) % will return the first outlier only at position 5
  1 Comment
Mariana
Mariana on 7 Mar 2023
Hi Antonio,
Thank you for your answer.
I just explained the problem better in the first comment above.

Sign in to comment.


Fifteen12
Fifteen12 on 7 Mar 2023
Your question is a little complex, as the definition of an outlier is not very well defined. For instance, in your vector, the second element a(2) is more than 10x larger than the following element a(3). Is it an outlier? Only you can really tell that. To tell if any generic element in a vector is an outlier you need to establish a clear definition of what you consider to be an outlier. The definition MATLAB uses for isoutlier (as the default option) is if the element is 3 standard deviations away from the median of the set, but you can change this definition using the method call.
It's a relatively simple task to deconstruct how isoutlier does this, which might help you in customizing your outlier approach.
a = [0;0.0028;0.0002;0.0039;0.0061]; %Sample vector
med = median(a); %Find the median
MAD = median(abs(a - med)); %Median Absolute Deviation: https://www.mathworks.com/help/matlab/ref/filloutliers.html#bvml247
dist = abs(a - MAD); %Distance from each element in a from the MAD
outliers = dist > 3*MAD; %boolean array where 1's indicate a number that was 3 MAD's away from the median
Using this method, none of the elements are outliers. But you can adjust the cutoff for a outlier and make it more sensitive. Hope this helps!
  1 Comment
Mariana
Mariana on 7 Mar 2023
Jhon,
Thank you very much for your help. I already had read the definition of the 'outlier' function, and my problem is that my vector changes as I change the system I'm approaching. But always, this generic vector a, will increase their values at the end. So I can't define one threshold, I need the thresholding changing as the data entries change.
I detailed better in the first comment above.

Sign in to comment.


Les Beckham
Les Beckham on 7 Mar 2023
Edited: Les Beckham on 7 Mar 2023
It seems like what you are wanting to do is to chop off the "increase at the end".
Here is one way to do that by searching backwards through a to find where it starts increasing.
a = [0; 0.0028; 0.0002; 0.0039; 0.0061; 0.0062]; % added an extra point to verify logic
last_index = 1 + numel(a) - find(diff(flip(a)) > 0) % find where a stops increasing at the end (working backwards)
last_index = 3
plot(a)
hold on
plot(a(1:last_index),'r*')
grid on
  2 Comments
Mariana
Mariana on 7 Mar 2023
Hi Les,
I think this could solve my problem too. I'm gonna try it and back here!
Thank you so much.
Mariana
Mariana on 7 Mar 2023
Les,
It does not work with the following vector,
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];

Sign in to comment.


Bruno Luong
Bruno Luong on 8 Mar 2023
Edited: Bruno Luong on 8 Mar 2023
Not sure, you are not better to describe what you want than most people; what you cann outlier seems to be point that violate the increasing trend:
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=diff(c);
i=find(d<0);
close all
plot(c); hold on; plot(i,c(i),'or',i+1,c(i+1),'*r')

Categories

Find more on Data Preprocessing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!