Splitting a table into smaller ones based on two columns
3 views (last 30 days)
Show older comments
Hello,
The code I use produces a master table with 8 columns and around 800 rows. I would like to sort the data in the following steps:
- Take first line of data from master table and look at the values in column six and eight.
- Find all other rows in the table with a column six value within 0.5 of the line one column six value and a column eight value within 2 of the line one column eight value.
- Create new table with this data.
- Delete sorted data from master table.
- Repeat 1-> 4 until no data left.
- Be left with multiple tables.
Is there a way to do this?
Thank you for your help.
0 Comments
Accepted Answer
Adam Danz
on 2 Oct 2019
Edited: Adam Danz
on 3 Oct 2019
That can easily be done in a loop but it doesn't sound like a good idea. Breaking apart well-organized tabular data into sub-tables is like moving into a new house by unpacking each box in the driveway and carrying in each item from the box individually rather than just carying in the box. Keep the data together whenever possible.
Instead, each row of the table can be assigned a subgroup number and then you can use those row numbers to pull out data as needed.
Here's a functional demo with comments to illustrate this method. 'rowGroup' is used to identify subtable rows.
% Create demo data
T = array2table(rand(20,8).*2);
T{:,8} = T{:,8} * 5;
% Identify the group number of each row based on
% col 6 & 8 values and their given tolerance levels.
rowGroup = zeros(size(T,1),1); % This will store the group number for each row
while any(rowGroup==0)
% find next unassigned row, starting at the top
rowNum = find(rowGroup==0,1,'first');
% find all rows in col 6 that are within tolerance
group1idx = abs(T{rowNum,6} - T{:,6}) <= 0.5;
% find all rows in col 8 that are within tolerance
group2idx = abs(T{rowNum,8} - T{:,8}) <= 2.0;
% identify the rows that fit into this group
rowGroup(group1idx & group2idx & rowGroup==0) = max(rowGroup)+1;
end
% rowGroup is a column vector of row numbers that identify the subgroups.
% Your values will differ due to using random data
% >> rowGroup(1:5)
% ans =
% 1
% 2
% 3
% 4
% 4
% now you can access sub-groups of data like this
T(rowGroup==1,:) % for group 1
To see the number of subgroups and the number of rows within each subgroup,
subgroupSummary = table((min(rowGroup):max(rowGroup))', ...
histcounts(rowGroup,min(rowGroup):max(rowGroup)+1)', ...
'VariableNames', {'Group', 'nRows'})
2 Comments
More Answers (1)
David K.
on 2 Oct 2019
Edited: David K.
on 2 Oct 2019
I would do it as such:
n = 1;
a = your table% I used table([1;2;3;4;5;],[1;5;1;3;1]) to sort of test
while ~isempty(a) % while your table is still empty
% here is finding the index of the logic you did - the second & might be an or (|) if
% you want all the rows that satisfy the col 7 or 8 requirements instead of col 7 AND 8 reqs
% these indices can also more cleanly be done the way adam does.
toRem = find(a.col7<a.col7(1)+0.5 & a.col7>a.col7(1)-0.5 & a.col8<a.col8(1)+2 & a.col8>a.col8(1)-2);
newTables{n} = a(toRem,:); % Put the found values into a new table inside a cell array
% putting them into a cell array is the easiest way to have different sized variables created in a loop
a(toRem,:)=[]; % remove from original table
n=n+1; % increment n
end
All of your resultant tables will now be in newTables.
I agree with Adam that this would not be a great idea in theory, but here is how you could do it if you still really want to sort them this way.
See Also
Categories
Find more on Tables in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!