Saving Each Table Within a Structure as its Own Entity

48 views (last 30 days)
Hello,
I have a large table (over 10,000 rows), and each row's first column's cell has an identifier, with N identifiers total.
Eventually, within each of these N tables, I want to sum the columns from rows 3 through i.
The value for N will vary based on the imported Excel data, and the number of columns will never include columns 1 and 2 (always begin with column 3), but the value of i will also vary with the imported data.
So far, I was able to get my N tables split from the one large table.
However, these tables are stored in either a structure or a cell (I did it two different ways).
The Structure Method looks like this:
for i=1:N
Count = strcat('v',num2str(i));
Table.(Count) = Large_Table(Large_Table.Identifier==ID_String(i),:);
end
where the Large_Table.Identifier is the column of the table with the identifying information, and ID_String is a string that lists all the identifier's names.
This method saves everything to a structure, with N fields named v1, v2, ... vN., and each field is a table with the correct data, size, etc.
From here, I would like to add up each element in column 3 in table v1, each element in column 4 in table v1, all the way to each element in column i in table v1.
Then do the same for table v2, table v3, all the way to table vN.
What I need to do is be able to call each table, then sum each column up and repeat.
But what I'm mostly unsure about is how I will set up this for loop, specifically with referencing the variable name in the Boolean statement after the "for", and then having to re-define the number of columns of each of the tables (v1, v2, ..., vN).
I guess when it comes down to it, I am unfamiliar with naming and utlizing variables within a loop.
If this is confusing, please let me know and I can explain further.
For what it's worth, the Cell Method looks like this:
for i=1:N
Test{i} = Large_Table(Large_Table.Identifier==ID_String(i),:);
end
which returns a cell (1xN), with each item within the cell holding a table.
From here, the same issue I have above applies here.
Thank you.
Edit:
In the attached Excel sheet, you'll see four columns.
Column A would be the identifier. The first part of my code removed the '.' and everything to the right of it.
Column B would be the 2nd column I referenced that is not important, as far as my final solution goes.
Columns C and D are the important columns that will eventually contribute to my final answers.
In this example, N=4 (J, K, L, and M) and i=4 (for the four columns).
I want to split this table into N=4 smaller tables: One with all the J's, one with the K's, the K's and the M's.
Then, with these four tables, I want to add up everything together in Column C for the J Table, and then add up everything together in Column D for the J table.
Then I want to repeat this for the K table.
Then again for the L table and M table.
So at the end of the day, I want essentially 8 values that represent the sum of every column C and column D input for the J, K, L, and M table.
  4 Comments
dpb
dpb on 3 Aug 2022
If what a gather you're after is so, you've making a very typical error in approaching calculations by grouping variables by trying to physically segregate the original into multiple other data variables. As you've discovered, this generally leads to just adding complexity rather than simplifying -- use grouping variables instead with the splitapply workflow -- or rowfun can often deal with the issue directly or even groupsummary may do it all almost automagically depending upon just what it is you're actually after.
Jon
Jon on 3 Aug 2022
I tried to recreate the code in a simpler manner but couldn't recreate my issue, and I can't post the original script due to confidentiality in my work.
What I will do which I think will help explain my ultimate goal is post a sample Excel sheet that will show what I want to do.
I'll add an edit in the original post that explains it more.
Thank you for your reply.

Sign in to comment.

Accepted Answer

dpb
dpb on 3 Aug 2022
Edited: dpb on 3 Aug 2022
"I want to split this table into N=4 smaller tables:..."
Again, that's the wrongheaded approach -- it can (and should) be done in two or three lines of code...
tT=readtable('Large_Table.xlsx'); % read the raw data
tT.Cat=categorical(extractBefore(tT.Var1,'.')); % create the grouping variable
tS=groupsummary(tT,'Cat',"sum",{'Var2','Var3','Var4'});
The above returns
>> tS
tS =
4×5 table
Cat GroupCount sum_Var2 sum_Var3 sum_Var4
___ __________ ________ ________ ________
J 51 464.1 634.95 1943.1
K 29 611.9 883.05 2612.9
L 21 600.6 875.7 2574.6
M 77 3334.1 4908.8 14345
>>
There's a mismatch between the description you provided for the file and the actual number of variables contained in it, I just did the ones that were finite at the beginning to illustrate; you can select whichever variables want/need.
NB: you can also select variables in a table with variables contaning a given name or by column number or variable type or... so you don't even have to have fixed column variable names as above.
Again, READ THE DOC!!! on table and the section on grouping variables and the splitapply workflow although as shown, most problems can be solved without that extra baggage.
  6 Comments
Jon
Jon on 4 Aug 2022
Thank you very much, that is exactly what I need.
I'll read more about that groupsummary command.
What's the old saying: the simplest solution tends to be the correct solution?
What was accomplished in one line of code I did in about 20 lines.
It's nice to have this community to help teach shortcuts.
dpb
dpb on 5 Aug 2022
Moral is to spend more time up front reading/studying the doc and the example usages provided -- there's a LOT of stuff implemented but unless one goes looking and invests some time, won't find or know about it.
The time spent will pay great dividends on the investment...

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 3 Aug 2022
So if I understand what you want to do correctly, you have a vector of identifiers and an array or a table of data:
rng default
ID = char(randi(double('AE'), 10, 1));
data = randi([-10 10], 10, 1);
T = table(ID, data)
T = 10×2 table
ID data __ ____ E -7 E 10 A 10 E 0 D 6 A -8 B -2 C 9 E 6 E 10
and you want to sum up all the rows in data corresponding to the identifier A, all the rows corresponding to identifier B, etc.? If so you can use groupsummary to sum (@sum) the values in the data variable in T, grouped by values in the ID variable in T.
result = groupsummary(T, 'ID', @sum)
result = 5×3 table
ID GroupCount fun1_data __ __________ _________ A 2 2 B 1 -2 C 1 9 D 1 6 E 5 19
In T, two rows have A in the ID variable: rows 3 and 6. The sum of the values in rows 3 and 6 of the data variable is 10 + (-8) or 2. B, C, and D only appear once in T so they're pretty straightforward. The rows in the data variable corresponding to E in the ID variable are -7, 10, 0, 6, and 10 and if you add those 5 numbers together you get a value of 19.
  1 Comment
Jon
Jon on 3 Aug 2022
Not quite my goal, but thank you.
I'm going to post a sample Excel sheet that should help explain it.

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!