How can I replace outliers with 2 standard deviation from the mean
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
0 votes
How can I replace outliers with 2 standard deviation from the mean?
This script is to replace with mean, but not 2std from the mean, anyone could help to modify this?
A=filloutliers(A,'center','mean','ThresholdFactor', 2);% replace 2 std away from mean with mean
Accepted Answer
Can't do it with filloutliers alone; it for some reason doesn't have the facility to use a function handle as a fill option...
One way amongst many but one that keeps filloutliers in the code--
A=filloutliers(A,'center',nan,'ThresholdFactor', 2); % step 1: replace 2 std away from mean with NaN
A(isnan(A))=2*std(A); % step 2: replace NaN w/ 2*std()
Does seem like a reasonable enhancement request to be able to use a function handle in filloutliers
Alternatively,
mnA=mean(A); sdA=std(A);
Z=(A-mnA)/sdA;
isOut=(abs(Z)>=2);
A(isOut)=mnA+sign(Z(isOut))*2*sdA;
ADDENDUM:
To consolidate in one place, with the added caveat the data are 2D by column, the above must be extended as follows:
Since mnA and sdA are now row vectors of column statistics, Z needs the "dot" division operator to be element-wise:
Z=(A-mnA)./sdA;
and have to apply the outlier calculation by column as well. It's probably just as quick here to write the explicit loop as:
for i=1:size(A,2)
A(isOut(:,i))= mnA(i)+sign(Z(isOut(:,i)))*2*sdA(i);
end
This way each column is a vector so the size of the logical elements selected will match and the mean and std dev are constants for the column instead of arrays.
9 Comments
AN NING
on 6 Feb 2021
Thank you!
dpb
on 7 Feb 2021
The first solution above is lacking -- it doesn't preserve the sign of the excursion; you'll definitely want to add that nicety.
AN NING
on 7 Feb 2021
Oh really, so should I use it or not?
AN NING
on 7 Feb 2021
I used the second solution, but it return with error:
Unable to perform assignment because the left and right sides have a different number of elements.
Error in Test (line 25)
A(isOut)=mnA+sign(Z(isOut))*2*sdA;
Must've done something wrong getting there -- works here for a sample set of data...show complete work.
>> A=randn(1,100);
>> mnA=mean(A);sdA=std(A);Z=(A-mnA)/sdA;
>> isOut=(abs(Z)>2);
>> sum(isOut)
ans =
4
>> A(isOut)=mnA+sign(Z(isOut))*2*sdA;
The logical vector isOut will have same number of True elements on both sides which will control the number of items to be assigned on LHS and number of selected items on RHS to match.
Doesn't seem as that can fail unless you didn't keep the Z vector or something else is mismatched...like if A were a 2D array instead of a vector, then mnA, sdA would be vectors instead of scalars of the statistics calculated by column.
AN NING
on 8 Feb 2021
My A is a 8x155 matrix. is it working for this type of data?
dpb
on 8 Feb 2021
As you discovered, exactly as written, "no".
Are the statistics of the data to be considered over the whole array? Or does the array represent differing subjects/tests/whatevers by column or row?
AN NING
on 8 Feb 2021
Thank you for replying.
The matrix A represent different variables by column, so I need to replace the outliers in each column variable with the 2 standard deviation of the mean in that columb variable. How can I solve this problem then?
dpb
on 8 Feb 2021
Shoulda' told us that going in... :)
As written above, then need two changes -- since mnA and sdA are now row vectors of column statistics, Z needs the "dot" division operator to be element-wise:
Z=(A-mnA)./sdA;
and have to apply the outlier calculation by column as well. It's probably just as quick here to write the explicit loop as:
for i=1:size(A,2)
A(isOut(:,i))= mnA(i)+sign(Z(isOut(:,i)))*2*sdA(i);
end
This way each column is a vector so the size of the logical elements selected will match and the mean and std dev are constants for the column instead of arrays.
More Answers (0)
Categories
Find more on Logical in Help Center and File Exchange
Tags
See Also
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)