Split a column into two based on a delimiter

Hi eveyone,
I have a table with one column and several rows of time stamps:
1 6:50.4
2 6:55.9
3 6:50.8
4 6:58.6
5 7:01.8
How do I split the table into 2 columns based on the first space separator ?
Thanks for your time!

12 Comments

How are you importing your data? There may be an opportunity to load the data and avoid this problem completely.
Chris, reading from PDF so it imports as as string and with some preprocessing to get it to the shown format
I was using strsplit and getting an error. And tried split and that works (thanks dbp!).
Ok, then there is no better way I can think of.
dbp, tried the split with delimiters. due to inconsistencies in the original data, in some places there is 1 space following the first number and sometimes there are 2 spaces. The dataset has 625rows so not possible to go through all. Any suggestions?
Error using split
Element 10 of the text contains 1 delimiters while the previous elements have 2. All elements must contain the same number of delimiters.
Blanks don't matter; split treats multiple spaces as one:
> split(cellstr(char('1 6:50.4','2 6:55.9','3 6:50.8','4 6:58.6','5 7:01.8')))
ans =
5×2 cell array
{'1'} {'6:50.4'}
{'2'} {'6:55.9'}
{'3'} {'6:50.8'}
{'4'} {'6:58.6'}
{'5'} {'7:01.8'}
>>
where removed the extraneous blank in first record; left in the rest.
Show code; attach actual dataset that has problem.
Data and code. These are active use timestamps, is for a machine, hence the column title.
a=[ "Machine"
"1 6:50:4"
"2 6:55.9"
"3 6:50.8"
"4 6:58.6"
"5 7:01.8"
"6 6:10.1"
"7 7:38.0"
"8 6:33.7"
"9 10:52.6"
"10 14:20.7"
"11 11:30.8"
"12 12:13.9"
"13 11:30.9"
"14 11:59.5"
"15 24:13.7"
"16 22:16.2"
"17 6:35.9"
"18 10:35.9"
"19 11:36.2"]
timestamp = table(split(a," "))
timestamp = splitvars(timestamp, "Var1","NewVariableNames",["sl. no","timestamp"])
Really strange that it is giving a new error (as seen in screenshot below)
More strange as it worked when I posted the "thanks dbp" . I converted the string into table with data in separate columns. Then I ran it again and gave the error. I changed nothing. Right now I ran after adding a space to line 14 and 15. That made no difference as I got the same new error (screenshot below) .
You need to skip the first value of a, since that does not have a delimiter in it.
split(a(2:end)." ")
"is for a machine, hence the column title. "
Don't read the titile into the data array; if use a table, tell readtable to either skip a headerline or to use the first line as table variable names. If do that, then readtable can split the two columns on input for you automagically.
Using detectImportOptions first would probably take care of it all for you without other intervention.
dbp, If there isnt any delimiter, shouldnt the function skip that row anyway? I am reading from PDF so it imports as string. With some preprocessing I get it to the shown format. Its not a table. I am trying convert it to one.
Chris, reading the "Machine" is crucial, as it indicates which equipment the timestamp is for.
I chaged name to Machine A so now the word includes one space i.e. the delimiter as well. Where the spacing between variables is inconsistent (lines 14 and 15), I've rectified that too with the reaplce function.
Now everything has the same delimiter and the same number of delimiters.
Code in the Answer section below.
Show us the file from which importing -- a pdf file is just text so may be best can do is all internal, but I still would not incorporate the column header into the data; it isn't data, so don't treat it as if were.
If the reading returns a string array like you showed above, as Chris says, then to split() it, you need to just forget the first row for the data and use the first row for the table variable name.
But need to know more about the timestamps -- are those durations or 24hr times-of-day? There's an inconsistency in the first entry of "6:50:4" and the rest that are (it appears) mm:ss.S but the first has what appears to be HH:mm:ss format.
Is the first an actual time while the rest are durations or what?
klb
klb on 7 Feb 2021
Edited: klb on 7 Feb 2021
"6:50:4" is typo. Should be "6:50.4" - it is correct in the original question. These are activation durations, hence not a 24 hour time of day.
Regarding the format, if you imagine pdf of several pages, with header and footer containing text and page numbers. So that is the preprocessing cleaning. The data is a rolling list of several machines/equipemt and timestamps i.e. Machine A , 20 timestamps.Next Machine B with 5 timestamps , Machine C with 30 timestamps, Machine D with no timestamps (meanig it wasnt used) and so on. The number of machines being monitored can vary and the number of timestamps vary too as they depend on usage. As I mentioend text headers are identifiers hence must be retained.

Sign in to comment.

 Accepted Answer

klb
klb on 7 Feb 2021
Edited: klb on 7 Feb 2021
Needed formatting. I chaged name to Machine A so now the word includes one space i.e. the delimiter as well. Where the spacing between variables is inconsistent (lines 14 and 15), I've rectified that too with the replace function.
Now everything has the same delimiter and the same number of delimiters.
a=["Machine A"
"1 6:50.4"
"2 6:55.9"
"3 6:50.8"
"4 6:58.6"
"5 7:01.8"
"6 6:10.1"
"7 7:38.0"
"8 6:33.7"
"9 10:52.6"
"10 14:20.7"]
a = replace(a," "," ")
timestamp = table(split(a));
timestamp = splitvars(timestamp, "Var1","NewVariableNames",["sl. no","timestamp"])

More Answers (1)

Your solution will not work if there are triple spaces. Also, your numbers get recorded as strings, which will make it hard to use the values for any computations.
Not sure what the end goal is, but here's an approach that creates a result I would want. Not sure what the timestamp format is so I assumed mm:ss.S
a=[ "Machine"
"1 6:50:4"
"2 6:55.9"
"3 6:50.8"
"4 6:58.6"
"5 7:01.8"
"6 6:10.1"
"7 7:38.0"
"8 6:33.7"
"9 10:52.6"
"10 14:20.7"
"11 11:30.8"
"12 12:13.9"
"13 11:30.9"
"14 11:59.5"
"15 24:13.7"
"16 22:16.2"
"17 6:35.9"
"18 10:35.9"
"19 11:36.2"];
% Looks scary, but applys strsplit to each element of the array ignoring multiple delimiters.
% It then converts the result to a double
data = arrayfun(@(a)str2double(strsplit(a,[" ",":","."],"CollapseDelimiters",true)),a(2:end),"UniformOutput",false);
% convert cell array to a matrix
data = cell2mat(data);
% convert numbers back to time
timestamp = duration(0,data(:,2),data(:,3),data(:,4)*100,"Format","mm:ss.S");
% Convert to a table. Use 'Machine" as the variable name
Data = table(data(:,1),timestamp,'VariableNames',[a(1),"timestamp"])
Data = 19x2 table
Machine timestamp _______ _________ 1 06:50.4 2 06:55.9 3 06:50.8 4 06:58.6 5 07:01.8 6 06:10.1 7 07:38.0 8 06:33.7 9 10:52.6 10 14:20.7 11 11:30.8 12 12:13.9 13 11:30.9 14 11:59.5 15 24:13.7 16 22:16.2

3 Comments

klb
klb on 7 Feb 2021
Edited: klb on 7 Feb 2021
Chris, I use the datetime() to convert to computable time data
Indeed, my code doesnt work with multiple delimiters. That is actually original question, How do I split the table into 2 columns based on the first space separator . For the data, I dont have the issue of multiple delimiters, so my solution worked.
Regarding the format, if you imagine pdf of several pages, with header and footer containing text and page numbers. The data is Its a rolling list of several machines/equipemt and timestamps i.e. Machine A , 20 timestamps.Next Machine B with 5 timestamps , Machine C with 30 timestamps, Machine D with no timestamps (meanig it wasnt used) and so on. The number of machines being monitored can vary and the number of activative use timestamps vary too as they depend on usage. With your code, the a(2:end)is hardwiring so it would generate error when it comes across the text 'Machine B'. I truly appreciate you typing out a solution!
edit: actually, If i can figure a way to seprately extract the data for each of the machines, then I could use your solution instead.
Datetime will include a date. Since you hadn't indicated there was one, I opted to use a duration. It has the same benefits of a datetime when it comes to computation without having a date included.
Data is timestamps only. No date, yet the datetime() worked out. Which is why I am curious that the split() rather than skip/do-nothing, returns an error when it doesnt find the delimiter. Will keep your approach in mind for future use. Thank you agin for your time!

Sign in to comment.

Categories

Tags

Asked:

klb
on 6 Feb 2021

Edited:

on 8 Feb 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!