Replacing certain values in table with NaN based on different conditions

195 views (last 30 days)
Hello everyone,
I'm basically new to Matlab and seem to be at a loss with my current problem - so any help would be greatly appreciated.
I have some data stored in a table called T that is 724 rows x 365 variables in size.
Now for some variables, I need to exclude extreme values based on different conditions before continuing on with my statistical analysis. But since these don't warrant excluding the whole dataset (row), I would like to replace those values with NaN in my table T.
As of now, this concerns only the variable in column 129, were values should be replaced with NaN if they are equal to or less than 3.1006 as well as column 131 were values should be replaced with NaN if they are equal to or less than 4.8673.
I have tried to get this to work for hours with several variations of logical indexing and am now stuck at this:
extremevalues = (T{:,129} <= 3.1006) | (T{:,131} <= 4.8673);
T(extremevalues) = NaN;
This does not give me any errors - which is the farthest I've come so far - however, it does not replace those extreme value in the respective columns with NaNs (there are no NaNs to be seen in either one of the specified columns in T, extremevalues however gives me several "1"s when quickly scrolling through it).
I've also tried substituting T{:, variablenr} with T.variablename - that didn't change anything either though.
I feel like the solution should be pretty easy, but I can't seem to find it (or am simply not experienced enough to find it).
Thanks very much in advance!

Answers (2)

darova
darova on 16 Aug 2021
Maybe you need to convert your data into matrix first
A = randi(10,5,4)
A = 5×4 table
Var1 Var2 Var3 Var4 ____ ____ ____ ____ 7 8 8 2 1 2 1 5 3 7 1 7 1 5 5 7 2 10 8 4
ind = A(:,1) < 5; % have this error when using 'table'
Operator '<' is not supported for operands of type 'table'.
A(ind,:) = nan
  1 Comment
Liza M
Liza M on 17 Aug 2021
Edited: Liza M on 17 Aug 2021
I think that would've worked if my table weren't of a mixed data type (there are other variables besides numeric ones in there as well).
Anyways, I just discovered the rmoutliers function, which does the thing I need with the specified variables..in just a lot less complicated way than what I've tried.
Thanks again regardless for your quick help!

Sign in to comment.


Ive J
Ive J on 17 Aug 2021
For table, you need to directly access the variable names/column indices. so, T(extremevalues) = NaN; won't work.
This may help:
t = splitvars(table(reshape(1:20, 5, 4)))
t = 5×4 table
Var1_1 Var1_2 Var1_3 Var1_4 ______ ______ ______ ______ 1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20
% set values below 8 in column 2 to nan
colN = 2;
cutoff = 8;
t.(colN)(t.(colN) < cutoff) = nan
t = 5×4 table
Var1_1 Var1_2 Var1_3 Var1_4 ______ ______ ______ ______ 1 NaN 11 16 2 NaN 12 17 3 8 13 18 4 9 14 19 5 10 15 20
  1 Comment
Liza M
Liza M on 17 Aug 2021
Hi,
so rmoutliers solved my problem this time around, but this is a good tip for the future, thanks for that!

Sign in to comment.

Tags

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!