How to replace values above a given percentile by nans

Question

Blue on 14 Aug 2019

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/476130-how-to-replace-values-above-a-given-percentile-by-nans

Commented: Blue on 16 Aug 2019

Hi,

I am trying to replace values above the 99th percentile in a table by nans by using Star Strider excellent little function (https://www.mathworks.com/matlabcentral/answers/127944-how-to-pick-the-j-th-percentile-of-a-vector). An example of my code looks like so:

% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
      t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
      t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;

The problem of course is that there are nan values scattered across multiple variables in the table and I therefore receive the following error message: Warning: Columns of data containing NaN values have been ignored during interpolation.

Does anyone has an idea as to how I could replace values above the 99th percentile in a table by nans in a table where there are already multiple nans ? Please note that I do not have the Statistic toolbox.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Adam Danz on 15 Aug 2019

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/476130-how-to-replace-values-above-a-given-percentile-by-nans#answer_387661

Edited: Adam Danz on 15 Aug 2019

Open in MATLAB Online

Here is an alternative function that ignores NaN values. It's not a nice one-liner like Star Striders but it does produce the same output as the prctile function in the stats toolbox.

function pctl = percentile(v,p)
% v is a vector of data; example v = [8 9 nan 13 15 11 nan 3 5 7];
% p is a percentile; example p = 75;
% pctl is the exact p_th percentile of v.
vsort = reshape(sort(v(~isnan(v))),1,[]); %row vector
pd = p/100 * numel(vsort);
idx = floor(pd + 0.5) + [0,1];
md = pd - idx(1);
idx(idx<1) = 1; 
idx(idx>numel(vsort)) = numel(vsort); 
pctl = sum(vsort(idx) .* (0.5+[-md,md]));
end  % <-- only needed if this function is within a script

Comparison

v = [8 9 nan 13 15 11 nan 3 5 7];
p = 82; 
percentile(v,p)      %  = 13.120000   this function
prctile(v,p)         %  = 13.120000   stat's toolbox
pct(v(~isnan(v)),p)  %  = 13.115995   the one-liner approximation

To apply it to your data,

t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
t.Var1(t.Var1 > percentile(t.Var1, 99)) = nan;
t.Var2(t.Var2 > percentile(t.Var2, 99)) = nan;

1 Comment
Show -1 older commentsHide -1 older comments

Blue on 16 Aug 2019

Thank you Adam

Sign in to comment.

Answer 2

per isakson on 15 Aug 2019

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/476130-how-to-replace-values-above-a-given-percentile-by-nans#answer_387660

Edited: per isakson on 15 Aug 2019

Open in MATLAB Online

Ignore the NaNs explicitely when calculating the percentile. Try

%%
% Table with nan
t = array2table(vertcat(horzcat([1:1000]', [1001:2000]'), NaN(10, 2)));
% Replace values above 99th percentile by nan
t.Var1(t.Var1 > pct(t.Var1, 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2, 99)) = nan;
function   z = pct(v,p)
    v( isnan(v) ) = [];
    z = interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))' ...
            ,   sort(v), p*0.01, 'spline');
end

1 Comment
Show -1 older commentsHide -1 older comments

Adam Danz on 15 Aug 2019

Edited: Adam Danz on 15 Aug 2019

Open in MATLAB Online

If you're going to use the one-liner appoximation, you can keep the function handle and explicitly ignore the NaNs from within the call to the function.

t = array2table(vertcat(horzcat((1:1000)', (1001:2000)'), NaN(10, 2)));
pct = @(v,p) interp1(linspace(0.5/length(v), 1-0.5/length(v), length(v))', sort(v), p*0.01, 'spline');
t.Var1(t.Var1 > pct(t.Var1(~isnan(t.Var1)), 99)) = nan;
t.Var2(t.Var2 > pct(t.Var2(~isnan(t.Var2)), 99)) = nan;
%                         {_____here____}

Sign in to comment.

How to replace values above a given percentile by nans

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to replace values above a given percentile by nans

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments