Cut timetable data following certain conditions

Question

0 votes

I have created a timetable that has one minute timesteps. I would like to take this timetable and create most likely a cell object of chunks of the data following these conditions listed below. If there is a better way to get this data, I'd love to hear suggestions. Thanks!

Criteria:

-to keep data

At least 5 min of continuous data (so positive, non-zero values)
Maximum of 5 minutes of no non-zero values between times whne data is present.

-to break up data into chunks.

No data for at least 60 minutes.

So, if there are five or more records with data, it should be taken and saved until the continuous records stop or, less than 6 values of no data between records occur. Then, after finding all these groups of data, I'd like to separate the groups with 60 minutes or more of no data between them.

Here's some sample data (timetable format):

'06-Feb-2019 03:40:00'	0	0
'06-Feb-2019 03:41:00'	0	0
'06-Feb-2019 03:42:00'	0	0
'06-Feb-2019 03:43:00'	0	0
'06-Feb-2019 03:44:00'	0	0
'06-Feb-2019 03:45:00'	0	0
'06-Feb-2019 03:46:00'	0	0
'06-Feb-2019 03:47:00'	0	0
'06-Feb-2019 03:48:00'	0	0
'06-Feb-2019 03:49:00'	0	0
'06-Feb-2019 03:50:00'	0.000900000000000000	0.00266666666666667
'06-Feb-2019 03:51:00'	0.00218333333333333	0.0195000000000000
'06-Feb-2019 03:52:00'	0	0
'06-Feb-2019 03:53:00'	0.00213333333333333	0.0118333333333333
'06-Feb-2019 03:54:00'	0.00286666666666667	0.0161666666666667
'06-Feb-2019 03:55:00'	0.00311666666666667	0.0213333333333333
'06-Feb-2019 03:56:00'	0.00161666666666667	0.00833333333333333
'06-Feb-2019 03:57:00'	0.00313333333333333	0.0173333333333333
'06-Feb-2019 03:58:00'	0.000633333333333333	0.00200000000000000
'06-Feb-2019 03:59:00'	0.00451666666666667	0.0521666666666667
'06-Feb-2019 04:00:00'	0	0
'06-Feb-2019 04:01:00'	0.000616666666666667	0.00183333333333333
'06-Feb-2019 04:02:00'	0	0
'06-Feb-2019 04:03:00'	0.000466666666666667	0.00133333333333333
'06-Feb-2019 04:04:00'	0.000500000000000000	0.00116666666666667
'06-Feb-2019 04:05:00'	0.00281666666666667	0.0141666666666667
'06-Feb-2019 04:06:00'	0.00376666666666667	0.0195000000000000
'06-Feb-2019 04:07:00'	0.00371666666666667	0.0150000000000000
'06-Feb-2019 04:08:00'	0.00235000000000000	0.00800000000000000
'06-Feb-2019 04:09:00'	0.00370000000000000	0.0151666666666667
'06-Feb-2019 04:10:00'	0.00341666666666667	0.0245000000000000
'06-Feb-2019 04:11:00'	0.00320000000000000	0.0170000000000000
'06-Feb-2019 04:12:00'	0.00410000000000000	0.0181666666666667
'06-Feb-2019 04:13:00'	0.00535000000000000	0.0396666666666667
'06-Feb-2019 04:14:00'	0.0116666666666667	0.0598333333333333
'06-Feb-2019 04:15:00'	0.00908333333333333	0.0375000000000000
'06-Feb-2019 04:16:00'	0.00708333333333333	0.0265000000000000
'06-Feb-2019 04:17:00'	0.00408333333333333	0.0126666666666667
'06-Feb-2019 04:18:00'	0.00598333333333333	0.0451666666666667
'06-Feb-2019 04:19:00'	0.0410166666666667	0.397333333333333
'06-Feb-2019 04:20:00'	0.0596166666666667	0.680333333333333
'06-Feb-2019 04:21:00'	0.0482500000000000	0.554166666666667
'06-Feb-2019 04:22:00'	0.0347333333333333	0.447500000000000
'06-Feb-2019 04:23:00'	0.00551666666666667	0.0335000000000000
'06-Feb-2019 04:24:00'	0.00425000000000000	0.0196666666666667
'06-Feb-2019 04:25:00'	0.00553333333333333	0.0450000000000000
'06-Feb-2019 04:26:00'	0.00823333333333333	0.0691666666666667
'06-Feb-2019 04:27:00'	0.00661666666666667	0.0560000000000000
'06-Feb-2019 04:28:00'	0.00968333333333333	0.0676666666666667
'06-Feb-2019 04:29:00'	0.00733333333333333	0.0455000000000000
'06-Feb-2019 04:30:00'	0.00471666666666667	0.0340000000000000
'06-Feb-2019 04:31:00'	0.00295000000000000	0.0191666666666667
'06-Feb-2019 04:32:00'	0.00441666666666667	0.0326666666666667
'06-Feb-2019 04:33:00'	0.00618333333333333	0.0575000000000000
'06-Feb-2019 04:34:00'	0.00696666666666667	0.0793333333333333
'06-Feb-2019 04:35:00'	0	0
'06-Feb-2019 04:36:00'	0	0
'06-Feb-2019 04:37:00'	0.000966666666666667	0.00616666666666667
'06-Feb-2019 04:38:00'	0	0
'06-Feb-2019 04:39:00'	0	0
'06-Feb-2019 04:40:00'	0	0

The data above should all be grouped together becasue breaks from 3:50 to 4:37 are less than 5 minutes. Let's assume that between the first timestep and ending timestep there were 60 consecutive zero values. I should metion that in all cases if one Variable is = 0 the second one will as well.

8 Comments
Show 6 older comments Hide 6 older comments

Adam Danz on 3 Dec 2019

Edited: Adam Danz on 3 Dec 2019

"I tried to upload the file. Doesn't seem to want to attach. Any ideas on this?"

Were you trying to attach it using the paperclip icon? Was the file too large? If so, we really only need enough rows to cover the pattern you described.

"Can you expand on the alternative method?"

Whether you use that method or not, you'll need to create a variable that identifies the rows you want to select. My proposal is to use that variable to isolate the rows as you're working with the table rather than pulling the selected rows out of the table.

"Let's start with just indexing groups of data where a min of 5 minutes (records) are kept, with any rows of data adjacent (before or after) included if separated by 5 or less rows of no-data."

I think I understand this but an example would really make it solid. Could you provide a short example of a datetime vector (or whatever format your data are in) and which rows should be selected from that vector? Attaching the data would be even better but I'd still like to see an expected output for at least a section of the table.

Note that the data in your question all differ by 1 minute so we cannot see the pattern you're describing.

Eric Escoto on 3 Dec 2019

The above data is a small sample of the total data in the timetble series through this current year. In the case above, I want the sum of all values in each variable column for these rows with data. This is becasue, there is a section of greater than 5 consecutive records (4:03 to 4:34). After finding that grouping the method should look + 5 and - 5 timesteps. If there is no-data on either side, the data is cut at that location. If there is data present in the variable columns for less than 6 records (in this case, timestep 4:37 is separated by only 2 rows of no-data) on either side, then this data is considered part of the original chunk and kept.

In this example, the data shown would be cut at 3:49 and 4:38 becasue there is greater than 5 minutes of no-data.

This method would slim down the total record to include new groupings of time periods with 5 consecutive records and neighboring records if separated by less than 6 minutes of no data.

Then, these new groups would again be parsed to only keep groups with greater than 60 minutes of no records. For example, if this data continued for 40 minutes with 0-values then had a record, we would not keep this data. If there was 61 minutes of 0-values from the end of this set to another record of poitive values then we would keep this group above.

each group will then have a summation on each variable performed in order to plot. In the end I'd like to know how many of these groups exist and deterime the sum of each variable for each group and plot as a single point. The number of points in the resulting figure is equal to the number of unique groups.

Eric Escoto on 6 Dec 2019

Edited: Eric Escoto on 10 Dec 2019

@Adam Danz, Hi, not yet. Ill try it out soon. Finals week! Had to push this back a few days. Soon.

Adam Danz on 10 Dec 2019

No worries, good luck with finals!

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Adam Danz on 3 Dec 2019

Edited: Adam Danz on 5 Dec 2019

Open in MATLAB Online

0 votes

Well, that took a little more time than I anticipated!

Here's a demo with inline comments explaining the details.

I added a few lines to the demo data from your question so that some data at the end would be excluded. Here I just convert the data into a timetable.

% Create demo data (Column vector)
t = {'06-Feb-2019 03:40:00'	0	0
'06-Feb-2019 03:41:00'	0	0
'06-Feb-2019 03:42:00'	0	0
'06-Feb-2019 03:43:00'	0	0
'06-Feb-2019 03:44:00'	0	0
'06-Feb-2019 03:45:00'	0	0
'06-Feb-2019 03:46:00'	0	0
'06-Feb-2019 03:47:00'	0	0
'06-Feb-2019 03:48:00'	0	0
'06-Feb-2019 03:49:00'	0	0
'06-Feb-2019 03:50:00'	0.000900000000000000	0.00266666666666667
'06-Feb-2019 03:51:00'	0.00218333333333333	0.0195000000000000
'06-Feb-2019 03:52:00'	0	0
'06-Feb-2019 03:53:00'	0.00213333333333333	0.0118333333333333
'06-Feb-2019 03:54:00'	0.00286666666666667	0.0161666666666667
'06-Feb-2019 03:55:00'	0.00311666666666667	0.0213333333333333
'06-Feb-2019 03:56:00'	0.00161666666666667	0.00833333333333333
'06-Feb-2019 03:57:00'	0.00313333333333333	0.0173333333333333
'06-Feb-2019 03:58:00'	0.000633333333333333	0.00200000000000000
'06-Feb-2019 03:59:00'	0.00451666666666667	0.0521666666666667
'06-Feb-2019 04:00:00'	0	0
'06-Feb-2019 04:01:00'	0.000616666666666667	0.00183333333333333
'06-Feb-2019 04:02:00'	0	0
'06-Feb-2019 04:03:00'	0.000466666666666667	0.00133333333333333
'06-Feb-2019 04:04:00'	0.000500000000000000	0.00116666666666667
'06-Feb-2019 04:05:00'	0.00281666666666667	0.0141666666666667
'06-Feb-2019 04:06:00'	0.00376666666666667	0.0195000000000000
'06-Feb-2019 04:07:00'	0.00371666666666667	0.0150000000000000
'06-Feb-2019 04:08:00'	0.00235000000000000	0.00800000000000000
'06-Feb-2019 04:09:00'	0.00370000000000000	0.0151666666666667
'06-Feb-2019 04:10:00'	0.00341666666666667	0.0245000000000000
'06-Feb-2019 04:11:00'	0.00320000000000000	0.0170000000000000
'06-Feb-2019 04:12:00'	0.00410000000000000	0.0181666666666667
'06-Feb-2019 04:13:00'	0.00535000000000000	0.0396666666666667
'06-Feb-2019 04:14:00'	0.0116666666666667	0.0598333333333333
'06-Feb-2019 04:15:00'	0.00908333333333333	0.0375000000000000
'06-Feb-2019 04:16:00'	0.00708333333333333	0.0265000000000000
'06-Feb-2019 04:17:00'	0.00408333333333333	0.0126666666666667
'06-Feb-2019 04:18:00'	0.00598333333333333	0.0451666666666667
'06-Feb-2019 04:19:00'	0.0410166666666667	0.397333333333333
'06-Feb-2019 04:20:00'	0.0596166666666667	0.680333333333333
'06-Feb-2019 04:21:00'	0.0482500000000000	0.554166666666667
'06-Feb-2019 04:22:00'	0.0347333333333333	0.447500000000000
'06-Feb-2019 04:23:00'	0.00551666666666667	0.0335000000000000
'06-Feb-2019 04:24:00'	0.00425000000000000	0.0196666666666667
'06-Feb-2019 04:25:00'	0.00553333333333333	0.0450000000000000
'06-Feb-2019 04:26:00'	0.00823333333333333	0.0691666666666667
'06-Feb-2019 04:27:00'	0.00661666666666667	0.0560000000000000
'06-Feb-2019 04:28:00'	0.00968333333333333	0.0676666666666667
'06-Feb-2019 04:29:00'	0.00733333333333333	0.0455000000000000
'06-Feb-2019 04:30:00'	0.00471666666666667	0.0340000000000000
'06-Feb-2019 04:31:00'	0.00295000000000000	0.0191666666666667
'06-Feb-2019 04:32:00'	0.00441666666666667	0.0326666666666667
'06-Feb-2019 04:33:00'	0.00618333333333333	0.0575000000000000
'06-Feb-2019 04:34:00'	0.00696666666666667	0.0793333333333333
'06-Feb-2019 04:35:00'	0	0
'06-Feb-2019 04:36:00'	0	0
'06-Feb-2019 04:37:00'	0.000966666666666667	0.00616666666666667
'06-Feb-2019 04:38:00'	0	0
'06-Feb-2019 04:39:00'	0	0
'06-Feb-2019 04:40:00'	0	0
'06-Feb-2019 04:41:00'	0	0
'06-Feb-2019 04:42:00'	0	0
'06-Feb-2019 04:43:00'	0	0
'06-Feb-2019 04:44:00'	0	0
'06-Feb-2019 04:45:00'	0	0
'06-Feb-2019 04:46:00'	0.000966666666666667	0.00616666666666667
'06-Feb-2019 04:47:00'	0.000966666666666667	0.00616666666666667
'06-Feb-2019 04:48:00'	0	0}; 
dates = datetime(t(:,1),'InputFormat','dd-MMM-yyyy HH:mm:ss'); % datetime vector
TT = array2timetable(cell2mat(t(:,2:end)),'RowTimes', dates);  % timetable

From here forward the only input is TT which is the timetable produced above. This adds two new columns to the timetable:

Safe which is a logical vector identifying the groups of consecutive data +/- 5 rows.
Group which identifies the group number for each consecutive block (this example data only has 1 block).

% Identify rows with missing data in any column.  Missing data 
% are considered to be values of 0. 
missingIdx = any(TT{:,:}==0,2); 
% Identify groups of consecutive data
consecGroups = bwlabel(~missingIdx);  %req image processing toolbox
consecGroups(consecGroups==0) = nan; 
groupDuration = splitapply(@range, TT.Properties.RowTimes, consecGroups); 
% mark grouped data that spans time greater than minTime as safe
minTime = minutes(5); 
safe = false(size(consecGroups)); 
safe(ismember(consecGroups,find(groupDuration > minTime))) = true; 
% Locate not-safe data that are within maxRows number of rows from safe data
maxRows = 5; 
notSafeRows = find(~safe); 
select = min(abs(find(safe)-notSafeRows.')) < maxRows;  % implicit expansion requires >=r2016b; 'safe' is col vec
safe(notSafeRows(select)) = true; 
% Assign safe vector to timetable and group number
TT.Safe = safe; 
TT.Group = bwlabel(safe);  %requires image proc toolbox

Now if you want to perform statistics on a certain group,

sum(TT.Var1(TT.Group==1))

21 Comments
Show 19 older comments Hide 19 older comments

Eric Escoto on 19 Dec 2019

Slightly confusing, I agree.

There are a total of 3 grouping criteria, although 1, 2 fit together.

Initally, we look at each record individually, then +4 and -4 records from there. This first step creates a minimum row count of 5. The head/tail (or second step, but maybe just for simplicity still included in the first step) of this minimum row count continues if there is a non-zero value and the group terminates when the row before or after the group hits a zero value (it only takes one zero value to terminate this grouping). The range of values for this first step is, rows >= 5.

The next critera continues up/down the rows which sandwich the group that have zero values. We should count these rows with zero values. If there are 60 or more rows with zeros before/after the rows with non-zero values (minimum of 5) then all rows with non-zero values are considered the final group.

In the data I sent you, this is an example of data that should be one group. For simplicity, assume that before and after this dataset, there are 61 zero-value rows. We can add these in if it helps with the parsing.

Perhaps, the first step should be to cutloc where >60 rows of zero values are counted? I'm working on this now.

Adam Danz on 5 Jan 2020

Open in MATLAB Online

Ok.... I think this should do the trick.

Replace this section

% Locate not-safe data that are within maxRows number of rows from safe data
maxRows = 5;
notSafeRows = find(~safe);
select = min(abs(find(safe)-notSafeRows.')) < maxRows;  % implicit expansion requires >=r2016b; 'safe' is col vec
safe(notSafeRows(select)) = true;

with this section

% Sliding window method
maxRow = 5; 
for i = 1:numel(safe)-maxRow
    tempWindow = safe(i:i+maxRow-1); 
    if tempWindow(1) && tempWindow(end)
        safe(i:i+maxRow-1) = true; 
    end
end

The slidiing window is a window whose length is defined by maxRows. It starts at the top of the safe vector and slides through the vector marking everying within the window as TRUE as long as the first and last elements of the window are TRUE.

Eric Escoto on 7 Jan 2020

Edited: Eric Escoto on 7 Jan 2020

TTtest.mat

I've been playing around with these lines and still don't get what I'm after. I've attached a better example of data to illustrate the desired outcome.

In the attacehd data, we have a timetable with 4420 rows. Using the first set of code, it parses out into three groups as can be seen in the fourth column. In row 4413 there is data in column 1 and 2, but it's more than 60 rows away from the previous data, and is not sandwiched by four other consecutive rows so this will be ignored as a group.

The first three groups should be only one because there is less than 60 rows of no data between groups.

Data locations.

Group 1 in rows 535:550

Group 2 in rows 555:654

Group 3 rows in 692:741

Single record at row 4413

In this case, the safe vector would also be TRUE for rows where data is sandwiched by less than 60 rows of no data. For the atached data set, the outcome should be one grouping from rows 538:740.

It may wind up that the first four rows of each group would be included but I'm thinking that I could then just manually remove them from each group.

Sign in to comment.

Cut timetable data following certain conditions

8 Comments
Show 6 older comments Hide 6 older comments

Accepted Answer

21 Comments
Show 19 older comments Hide 19 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Cut timetable data following certain conditions

8 Comments Show 6 older comments Hide 6 older comments

Accepted Answer

21 Comments Show 19 older comments Hide 19 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

8 Comments
Show 6 older comments Hide 6 older comments

21 Comments
Show 19 older comments Hide 19 older comments