Merge two columns in timetable?

I'm merging a number of .csv files into one big timetable. Each of the .csv files represents one day's data from a radio station - one column for frequency, one column for amplitude. I'd like for each station to have its own frequency column and its own amplitude column in the final timetable, but since I am adding the columns to the timetable one file at a time using the synchronize() function I wind up with multiple columns for each.
Here's the for loop I'm generating to create the timetable. The original files are named analysis_XXXXXX - Callsign.csv, where XXXXX is the date and Callsign is the station identifier.
for j = 1:length(Files)
% Extract date and call from filename.
meta = extractBefore(FileNames(j), '.csv')
meta = split(meta, ' - ')
date_recorded = char(extractAfter(meta(1), 'analysis_'))
day = str2num([date_recorded(1:2); date_recorded(3:4); date_recorded(5:6)])
start = datetime([2000+day(1) day(2) day(3) 0 0 0]);
Call = meta(2);
foo = readtable(FileNames(j), 'Range','A:E');
foo.Properties.VariableNames = [{'UTC'},{'Freq'},{'FreqErr'},{'Vpk'},{'dBV_Vpk_'}];
foo=table2timetable(foo);
foo.UTC = foo.UTC + start;
foo.Properties.VariableNames = strcat(foo.Properties.VariableNames, '-', Call);
if exist('baz')
baz = synchronize(foo, baz);
else
baz = foo;
end
end
This results in columns named, say, Freq-Station1_foo which has values for Day 1 of data collection and NaNs for Day 2, next to Freq-Station1_baz which has values for Day 2 and NaNs for Day 1. Is there a way I can merge them?

2 Comments

It would be easier to offer suggestions if you could share a few of your files. You can use the paperclip icon to attach them to your post.
Here's a minimal working example of the problem I'm trying to figure out.
load outdoors
TT1 = head(outdoors); TT1.PressureHg = [];
TT2 = TT1; TT2.TemperatureF = [];
TT2.Time = TT2.Time + hours(4);
outerjoin(TT1, TT2, "MergeKeys", 1)
If you execute that, you'll see that you end up with two Humidity columns that have different names. I'm looking for a command that will combine both into a single column. Is there a way to do this with the join() or outerjoin() commands?

Sign in to comment.

 Accepted Answer

One simple way to combine these variables is to add them together in a new timetable variable and delete the old variables.
For example, assuming the table looks something like this,
>> baz = synchronize(t1,t2)
baz =
9×8 timetable
UTC Freq_t1 FreqErr_t1 Vpk_t1 dBV_Vpk__t1 Freq_t2 FreqErr_t2 Vpk_t2 dBV_Vpk__t2
____________________ _______ __________ ______ ___________ _______ __________ ______ ___________
01-Jan-2020 00:00:00 9 10 8 4 4 NaN 3 2
01-Jan-2020 06:00:00 NaN 2 10 7 8 7 8 3
01-Jan-2020 12:00:00 2 NaN 7 2 8 7 3 NaN
01-Jan-2020 18:00:00 10 10 1 8 2 NaN 6 3
02-Jan-2020 00:00:00 7 5 9 1 5 2 NaN 9
02-Jan-2020 06:00:00 1 9 10 3 5 5 9 3
02-Jan-2020 12:00:00 3 NaN 7 1 7 10 10 10
02-Jan-2020 18:00:00 6 5 NaN NaN 8 4 6 4
03-Jan-2020 00:00:00 10 10 8 9 8 NaN 2 2
we can do
>> baz.FreqErr = sum([baz.FreqErr_t1 baz.FreqErr_t2],2,"omitnan");
>> baz = removevars(baz,["FreqErr_t1","FreqErr_t2"])
baz =
9×7 timetable
UTC Freq_t1 Vpk_t1 dBV_Vpk__t1 Freq_t2 Vpk_t2 dBV_Vpk__t2 FreqErr
____________________ _______ ______ ___________ _______ ______ ___________ _______
01-Jan-2020 00:00:00 9 8 4 4 3 2 10
01-Jan-2020 06:00:00 NaN 10 7 8 8 3 9
01-Jan-2020 12:00:00 2 7 2 8 3 NaN 7
01-Jan-2020 18:00:00 10 1 8 2 6 3 10
02-Jan-2020 00:00:00 7 9 1 5 NaN 9 7
02-Jan-2020 06:00:00 1 10 3 5 9 3 14
02-Jan-2020 12:00:00 3 7 1 7 10 10 10
02-Jan-2020 18:00:00 6 NaN NaN 8 6 4 9
03-Jan-2020 00:00:00 10 8 9 8 2 2 10

7 Comments

Thanks, but I'm not trying to sum the values. Rather, I'm trying to append the contents of new columns to that of the columns with matching names.
Perhaps vertical concatenation is what you want?
>> A = table([5;6;5],['M';'M';'M'],[45;41;40],[45;32;34],{'NY';'CA';'MA'},...
'VariableNames',{'Age' 'Gender' 'Height' 'Weight' 'Birthplace'},...
'RowNames',{'Thomas' 'Gordon' 'Percy'})
A =
3x5 table
Age Gender Height Weight Birthplace
___ ______ ______ ______ __________
Thomas 5 M 45 45 {'NY'}
Gordon 6 M 41 32 {'CA'}
Percy 5 M 40 34 {'MA'}
>> B = table(['F';'M';'F'],[6;6;5],{'AZ';'NH';'CO'},[31;42;33],[39;43;40],...
'VariableNames',{'Gender' 'Age' 'Birthplace' 'Weight' 'Height'})
B =
3x5 table
Gender Age Birthplace Weight Height
______ ___ __________ ______ ______
F 6 {'AZ'} 31 39
M 6 {'NH'} 42 43
F 5 {'CO'} 33 40
>> [A;B]
ans =
6x5 table
Age Gender Height Weight Birthplace
___ ______ ______ ______ __________
Thomas 5 M 45 45 {'NY'}
Gordon 6 M 41 32 {'CA'}
Percy 5 M 40 34 {'MA'}
Row4 6 F 39 31 {'AZ'}
Row5 6 M 43 42 {'NH'}
Row6 5 F 40 33 {'CO'}
I appreciate the Thomas the Tank Engine reference very much. ^.^
This is somewhat helpful, but I still run into the issue that my new data doesn't have the same number of columns as the main table. I may be able to give it more columns as a workaround, but that seems rather inelegant.
It seems to me that this ought to be an option in the synchronize() function, as an alternative to the current default behavior of appending the source table name to the column name in the unified table. If the time data doesn't conflict, unify the columns; if it does conflict, throw an error.
I feel like innerjoin() or outerjoin() should have a way of taking care of this, but I haven't been able to get them to work yet...
Hi Kristina,
Thank you for your reply.
vertcat is the most obvious choice
Given that your ultimate goal is to "append the contents of new columns to that of the columns with matching names" vertical concatenation (i.e. the vertcat function) is probably what you want. Then, we can get the row-times in order with sortrows. Admittedly, it is incovenient that vertcat errors in the case when the two timetables have different sets of variables. We are aware of this pain point and have an existing enhancement request to make this easier.
Assignment to new timetable rows also works
Assuming the row-times of each timetable are always disjoint, as in your examples, we can just use assignment of rows from the source timetable into new rows of the target timetable.
>> t1 = timetable(seconds(1:3)',(1:3)',(1:3)','VariableNames',["A","C"])
t1 =
3×2 timetable
Time A C
_____ _ _
1 sec 1 1
2 sec 2 2
3 sec 3 3
>> t2 = timetable(seconds(4:6)',(4:6)',(4:6)','VariableNames',["A","B"])
t2 =
3×2 timetable
Time A B
_____ _ _
4 sec 4 4
5 sec 5 5
6 sec 6 6
>> rowTimesAreDisjoint = isempty(intersect(t1.Time,t2.Time));
>> if rowTimesAreDisjoint
t1(t2.Time,t2.Properties.VariableNames) = t2(:,t2.Properties.VariableNames)
end
Warning: The assignment added rows to the table, but did not assign values to all of the table's existing variables. Those variables are extended with rows containing default values.
> In tabular/subsasgnParens (line 427)
In tabular/subsasgn (line 64)
t1 =
6×3 timetable
Time A C B
_____ _ _ _
1 sec 1 1 0
2 sec 2 2 0
3 sec 3 3 0
4 sec 4 0 4
5 sec 5 0 5
6 sec 6 0 6
What to do when the row-times match?
So far, I've assumed that the row-times between the two timetables are disjoint. This may not always be the case. What kind of behavior do you expect for matching row-times? Are you expecting to aggregate data using one of the methods in retime/synchronize such as 'mean', 'sum', or a function handle? It's worth noting that, once we've created the new vertically-concatenated timetable, we can use retime to aggregate rows with duplicate row-times.
Here's an example of using assignment to new rows, vertcat, sortrows, and retime to combine two timetables with partially overlapping row-times and partially overlapping sets of variables.
>> t1 = timetable(seconds(1:3)',(1:3)',(1:3)','VariableNames',["A","C"])
t1 =
3×2 timetable
Time A C
_____ _ _
1 sec 1 1
2 sec 2 2
3 sec 3 3
>> t2 = timetable(seconds(2:4)',(4:6)',(4:6)','VariableNames',["A","B"])
t2 =
3×2 timetable
Time A B
_____ _ _
2 sec 4 4
3 sec 5 5
4 sec 6 6
>> t1Vertcat = t1;
>> t2Vertcat = t2;
>> % Expand t1 to include all variables from t2
>> t1Vertcat([],t2.Properties.VariableNames) = t2([],t2.Properties.VariableNames)
t1Vertcat =
3×3 timetable
Time A C B
_____ _ _ _
1 sec 1 1 0
2 sec 2 2 0
3 sec 3 3 0
>> % Expand t2 to include all variables from t1
>> t2Vertcat([],t1.Properties.VariableNames) = t1([],t1.Properties.VariableNames)
t2Vertcat =
3×3 timetable
Time A B C
_____ _ _ _
2 sec 4 4 0
3 sec 5 5 0
4 sec 6 6 0
>> % vertcat and sort
>> t1 = sortrows([t1Vertcat;t2Vertcat])
t1 =
6×3 timetable
Time A C B
_____ _ _ _
1 sec 1 1 0
2 sec 2 2 0
2 sec 4 0 4
3 sec 3 3 0
3 sec 5 0 5
4 sec 6 0 6
>> % Aggregate data in rows with duplicate row-times
>> retime(t1,unique(t1.Time),'mean')
ans =
4×3 timetable
Time A C B
_____ _ ___ ___
1 sec 1 1 0
2 sec 3 1 2
3 sec 4 1.5 2.5
4 sec 6 0 6
Looking forward to your response.
Best,
Seth
Thanks, Seth! I figured out a solution, but yours is much better articulated.
This code exists inside a for loop where each file is assigned to the timetable "audio," and all the variables are unified inside a timetable under the name "data." (Terrible variable names, yes.) The label for each station is a string with the variable name "Call", which changes on each iteration of the for loop.
% Create a big ol' timetable with all our audio data:
audio.Properties.VariableNames=Call;
if exist('data') ==1 % If we're not on our first round
if any(strcmp(Call, data.Properties.VariableNames))
fprintf('We already have some data from this station...')
audio = vertcat(audio, timetable(data.Time, data{:, Call}, 'VariableNames', Call));
data=removevars(data, Call); %delete the duplicate
end
data = synchronize(data, audio); %add data for the station (back?) into the big table
else % if we are on our first round:
data = audio;
end
Awesome. Glad you found a workable way to build up the intended timetable. Thank you for sharing your solution with us.
One quick thing about the code in your solution:
  • timetable(data.Time, data{:, Call}, 'VariableNames', Call) can be replaced with data(:,Call). This expression is creating a copy of data, but with only the variables in Call, which can be done more succinctly with table subscripting.

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!