How to do a right join based on multiple columns as ID?

Dear all,
I want to join set B from the right with set A, based on the first three columns. (The values of the three columns are the ID of both sets.) Sorry, I do not know the terminology in matlab).
All the explanations I do find are expressed in terms of keys variables, variable names and it is really confusing.
Can somebody give me pls some instructions/suggestions?
Given data set A A=[1 2 3 9 8; 1 2 3 3 12; 1 4 3 11 12; 2 3 4 3 12]
and data set B: B=[2 3 4 0 0; 1 2 3 9 8; 1 3 4 1 1; 1 2 3 6 7]
The key variables in each one of these sets are for each line the first three values.
The result should be: [2 3 4 0 0 3 12; 1 2 3 9 8 9 8; 1 2 3 9 8 3 12; 1 3 4 1 1 nAn nAn; 1 2 3 6 7 5 8; 1 2 3 6 7 3 12];

2 Comments

All the explanations I do find are expressed in terms of keys variables, variable names and it is really confusing
That is because the data is stored in tables, not matrices. Usually, when columns of a matrix represent different things, it's better to store the data in a table which allows for easier manipulation.
If I am not wrong, there is a little confusing error (for the novice reader) in the last line of the problem descriptioin:
The result should be: [...; ...; ...; ...; 1 2 3 6 7 5 8; 1 2 3 6 7 3 12];
sould read:
The result should be: [...; ...; ...; ...; 1 2 3 6 7 9 8; 1 2 3 6 7 3 12];

Sign in to comment.

 Accepted Answer

This comes close to doing what you want:
A=[1 2 3 9 8; 1 2 3 3 12; 1 4 3 11 12; 2 3 4 3 12];
B=[2 3 4 0 0; 1 2 3 9 8; 1 3 4 1 1; 1 2 3 6 7];
Ac = mat2cell(A, ones(size(A,1),1), [3 1 1]);
Bc = mat2cell(B, ones(size(A,1),1), [3 1 1]);
TA = cell2table(Ac , 'VariableNames',{'ID','Var1','Var2'});
TB = cell2table(Bc , 'VariableNames',{'ID','Var1','Var2'});
TJ = outerjoin(TA,TB, 'MergeKeys',1, 'Keys','ID', 'LeftVariables',{'ID','Var1','Var2'}, 'RightVariables',{'ID','Var1','Var2'})
Result:
TJ =
7×5 table
ID Var1_TA Var2_TA Var1_TB Var2_TB
___________ _______ _______ _______ _______
1 2 3 9 8 9 8
1 2 3 9 8 6 7
1 2 3 3 12 9 8
1 2 3 3 12 6 7
1 3 4 NaN NaN 1 1
1 4 3 11 12 NaN NaN
2 3 4 3 12 0 0
EDIT Corrected typographical error in the explanation. Code unchanged.

10 Comments

Clarisha Nijman’s ‘Answer’ moved here:
Thank you Star,
this answer is indeed the closest. It does the outerjoint. But I'll delete the rows I do not want.
Can you give some explanation on mat2cell(...) code. I am trying to apply this code on a larger scale and it says, Table A has more than 5 columns and should be right joint to table B. But it gives me this error:
Error using mat2cell (line 89) Input arguments, D1 through D2, must sum to each dimension of the input matrix size, [10 6].
Error in EstimationWithTripple (line 21) Ac = mat2cell(A, ones(size(A,1),1), [3 1 1]);%choses the first 3 elements of A,
My pleasure.
The mat2cell call I use here:
Ac = mat2cell(A, ones(size(A,1),1), [3 1 1]);
creates a cell array consisting of each row (the ‘ones(size(A,1),1)’ argument), and using ‘[3 1 1]’ to define the first 3 columns (that will become the ‘ID’ variable) as one variable, with separate columns for the other variables. The sum of the numeric value of the arguments for the rows and columns each must must equal the respective sizes of the argument matrix. So here, the first argument has a column of 1, the sum equaling the row size of the matrix, and 3+1+1 equaling the column size of the matrix.
It is an easy way to create cell arrays of ‘A’ and ‘B’ that I can then use to create the table configurations I want.
Thanks a lot,
It is clear, so if my A matrix has 7 columns the last part of the code equals [3 1 1 1 1] and it works!!!!
Many tnx!
My pleasure.
If my Answer helped you solve your problem, please Accept it!
Note that Star grouped the three columns of the ID into a single variable in the table which I agree is the simplest approach for the join. However, it may be that for other purposes, it is better to keep the three IDs as separate variables. In thi case, it's still easy to perform the join, and it's even easier to create the tables:
A=[1 2 3 9 8; 1 2 3 3 12; 1 4 3 11 12; 2 3 4 3 12];
B=[2 3 4 0 0; 1 2 3 9 8; 1 3 4 1 1; 1 2 3 6 7];
TA = array2table(A, 'VariableNames', [compose('ID%d', 1:3), {'Var1', 'Var2'}]);
TB = array2table(B, 'VariableNames', [compose('ID%d', 1:3), {'Var1', 'Var2'}]);
TJ = outerjoin(TA, TB, 'Keys', [1 2 3], 'MergeKeys', true)
@Clarisha Nijman — Thank you!
@Guillaume — That is certainly a viable alternative approach, and one that did not occur to me. I appreciate your contributing it.
Can you give some explanation about the arguments in the array2table code? I am getting this error:
Error using array2table (line 62) The VariableNames property must be a cell array, with each element containing one nonempty character vector.
Error in kladje (line 13) TA = array2table(A, 'VariableNames', [compose('ID%d', 1:3), {'Var1', 'Var2'}]);
Obviously, the cell array that you give for the 'VariableNames' optional argument must have as many elements that there are columns in your matrix. My [compose('ID%d', 1:3), {'Var1', 'Var2'}] generates 5 variable names, if your actual array has more columns, you'll have to adjust that bit.
Thank a lot for the explanation!

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!