Removing row from a matrix if value in row < previous value

13 views (last 30 days)
I have data that when sorted based on column 1, column 6 should be in ascending order, so n should >n-1. However the program that outputs this data creates incorrect values for column 6 that are much lower than they should be for a few data points, then they return to normal. I want to remove these values (or make n = n-1 if n<n-1)
At the moment I'm doing this with an if command in excel after sorting the file, but with 100's of files this is incredibly tedious.
I've tried the below, choosing a threshold value for the "incorrect" data to be below, however the threshold changes between files so this is no use.
datas = sortrows(data,-1) %sorts data by descending distance data
rowremove = datas(:,6)<=athres %removes row if area data is <= threshold - corrects for "bad" area data
datas(rowremove,:) = []
Is this possible in matlab?
  2 Comments
Jan
Jan on 16 Nov 2016
Edited: Jan on 16 Nov 2016
Is what possible?
How do you identify the bad data securely? If a threshold is not sufficient, n>n-1 might be. But what happens if several bad values are neighboring? Is the n>n-1 property guaranteed even then?
Alexander Seal
Alexander Seal on 16 Nov 2016
Yes that is an issue unfortunately, there is some neighboring bad data. The way the excel if function works is to append the "good" data to a new column, and check the original data's column against the new columns preceding data.

Sign in to comment.

Answers (1)

dpb
dpb on 16 Nov 2016
Jan's point is valid; if the bad data are excessively corrupted removing the offending rows may only create a new set of offending values. If, otoh, the overall slope is large enough and the erroneous values not too far of, then perhaps
ix=[true; diff(x(:,6))>0];
x=x(ix,:);
may work. If the above causes the issue that you then have a new set of offenders, then you'll likely have to use the above to
  1. locate the first of each offending section
  2. search from that point to the next
  3. replace/remove those sections before processing next
A sample dataset (relatively short) but showing typical result would be helpful, probably. (Only need the two columns; the additional are immaterial to the problem of selection/retention/disposal).
  1 Comment
Alexander Seal
Alexander Seal on 16 Nov 2016
Edited: Alexander Seal on 16 Nov 2016
thank you for the reply, attached is a sample of the "worst offending" data sets, with multiple "bad" data neighboring
Columns A and F are relevant, data is sorted by column A descending, which in theory should lead to column F ascending (the large steps in data are expected). Note instances of sudden drops in order of magnitude (see rows 270-274). Column G is the "fixed" data using '=if(f3<g2,TRUE=g2+(g2-g1), FALSE=f3)'

Sign in to comment.

Categories

Find more on Descriptive Statistics in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!