How to split a sequence based on values from one variable

6 views (last 30 days)
Good evening,
I can't figure out how to solve the following problem.
Assuming that I have a dataset as in the picture, I would like to divide it into many smaller datasets using the variable "State" and keeping the sequence. Actually the real dataset has more than 200000 observations so I can't know when the variable State changes from NORMAL to RECOVERY and vice versa, but I would like to split the dataset into many mini sequences where each one has the same State variable for all the observations.
Then, I would need to divide the variables into a Predictors set (varaibles Sensor 1, Sensor 2, Sensor 3) and a Response set (variable State).
If we take, as an example, the image, at the end of the problem I would like to have for the Predictors a cell array of size Nx1 (N equal to the number of mini sequences) with the first cell of size 3x2 (the three features and the first two observations), the second cell of size 3x2, the third cell of size 3x1 and so on. Correspondingly, for the Response I would like to have an Nx1 cell array where the first cell is of dimension 1x2, the second is 1x2, the third is 1x1 and so on.
The problem is that with a dataset of 200000 observations I don't know what kind of loop to use and how to use it.
Thank you!

Accepted Answer

Ameer Hamza
Ameer Hamza on 3 May 2020
See the following example.
First create an example table
data = {1, 2, 3, 'norm'; 2, 3, 4, 'norm';
2, 3, 1, 'rec' ; 4, 4, 2, 'rec';
1, 2, 3, 'norm'; 2, 3, 4, 'rec';
2, 3, 1, 'rec' ; 4, 4, 2, 'rec'};
t = cell2table(data, 'VariableNames', ...
{'sen1', 'sen2', 'sen3', 'state'}); % an example table
t =
8×4 table
sen1 sen2 sen3 state
____ ____ ____ ________
1.00 2.00 3.00 {'norm'}
2.00 3.00 4.00 {'norm'}
2.00 3.00 1.00 {'rec' }
4.00 4.00 2.00 {'rec' }
1.00 2.00 3.00 {'norm'}
2.00 3.00 4.00 {'rec' }
2.00 3.00 1.00 {'rec' }
4.00 4.00 2.00 {'rec' }
Then run the following code to split the data
idx = findgroups(t.state);
partition_idx = [1; find(diff(idx)~=0)+1; size(data,1)];
partition_idx = discretize(1:size(data,1), partition_idx);
sensor_val = splitapply(@(x) {x}, table2cell(t(:,1:3)), partition_idx.');
state_val = splitapply(@(x) {x}, table2cell(t(:,4)), partition_idx.');
sensor_val and sensor_val are cell arrays containing the required values.

More Answers (0)




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!