Find a row of repeated values?

13 views (last 30 days)
Jacqueline
Jacqueline on 3 Jul 2013
Hi, first off I am very new to Matlab. I asked a similar question yesterday but I don't think I was specific enough so I'm going to try again...
I am analyzing data that comes from a truck, such as engine speed, vehicle speed, etc. The data is collected randomly throughout the day for about 3 hours (10,000 seconds). I am trying to write a script that will detect any repetition in the data that last for 60 seconds or more. Because if that happens, there is something wrong with the sensors in the truck.
For example, if for engine speed I had 10,000 points ranging from 0-2100, and for 60 seconds in a row the data was stuck on 1,000, how would I write a script to detect this and say there is an error?
I appreciate any help I can get! Thanks

Accepted Answer

Evan
Evan on 3 Jul 2013
Edited: Evan on 3 Jul 2013
Hi. Sorry for not getting back to your comment on my answer yesterday. Here is how I would do it:
First, some random data for my example:
data = 2100*rand(1,10000); %random dataset
Next, I'll make a few sections of data repitition:
data(1,50:120) = 79.356; %set some data to constant value
data(1,200:210) = 81.220; %set some data to constant value
data(1,400:520) = 1445.201; %set some data to constant value
data(1,900:948) = 0.113; %set some data to constant value
Now do the differencing. Runs of zeros will be potential problem areas. The ~ logical command is used to return binary data. That is, where the difference function returned zero (no change) we return "true." Everywhere else returns "false." So now we have a 10,000 element binary vector with sections of ones and zeros, and the ones are repetitions.
datarep = ~diff(data);
Now here is where I search for zeros. Like I said, there are definitely other ways of doing this, including using a for loop, but I find this to be the most compact and simple way I've come across. I'll split it up into steps instead of jamming it all together like I did yesterday.
First, turn your differenced vector into a string:
datarepstr = num2str(datarep) %convert to string
Turning a vector into a string puts spaces between each number, so we'll use a "regular expression replace" function to get rid of them and leave us just the ones and zeros. The function finds all points of ' ' in our string and replaces them with ''.
s = regexprep(datarepstr,' ',''); %remove spaces
Now we want to find where all the ones are in the string, as well as how long each sections of ones is. regexp searches our string for all cases where there are one or more ones, or '1+'. Our expression should find four different sections of ones (because that's how many runs of repetition I added. "ids" is the start of each section and runs is the section pulled out from the string.
[ids runs] = regexp(s,'1+','start','match'); %find all runs and the point where they start
These values are returned in cell arrays. cellfun is a function that performs another function (in this case, length) on each cell of an array. It's like looping over each element but more compact. l should have four elements telling how long each run is.
l = cellfun('length',runs); %find the length of each run
Now we have everything we need in order to check our potential problem runs for ones that cross the line. It will all depend on the frequency of your sampling. If it's on datapoint every second, we'll see if any of our lengths are greater than sixty. If it's every half second, we'll look for >120. And so on.
if any(l > 60) %if any run is longer than 60, display message
disp('Error')
end
Of course, you may want more info than that in your message. You may also want to stop execution of your program, in which case calling error instead of disp would be needed. You may want to tell which elements are the problematic repetitions, and you can do that, because you have the lengths of the runs in l and the indices of where each run starts in ids.
Finally, here's the function in its entirety, now in a very compact form:
[ids runs] = regexp(regexprep(num2str(~diff(data)),' ',''),'1+','start','match');
l = cellfun('length',runs);
if any(l > 60)
disp('Error')
end

More Answers (1)

Jacqueline
Jacqueline on 8 Jul 2013
I'm trying your method with my variable Engine Speed, which has 62,762 values ranging from 0-2100. So, the data is in a 62762x1 array. when I set data = EngineSpeed, and did all steps up to s = regexprep(datarepstr,' ',''), I got the error...
Error using regexprep The first argument (STRING) must be a one-dimensional array of char or cell arrays of strings.
  7 Comments
Jacqueline
Jacqueline on 8 Jul 2013
That works perfectly! Thanks Evan, you're amazing
Jacqueline
Jacqueline on 11 Jul 2013
Sorry Evan, another question...
How would I find the starting locations of where the errors occur? I mean, where the chunks of data that repeat for more than 60 seconds begin? I tried find(l>300) but that didn't work right

Sign in to comment.

Categories

Find more on Data Type Identification in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!