Time converted to a str

Question

0 votes

I have Matlab R2012b and am importing a rather large .csv file that is created by another tool. The only column I care about from this file is the one with all of the timestamps in it so I've imported the column as text but it ends up as a cellstring. As I need to convert the timestamp to Seconds of Day to find out if there are any gaps in my data I have done the following:

filename = 'myfilename.csv';
delimiter = ',';
startRow = 2; %it always starts at row 2 as row 1 is the header name
FormatIn = 'HH:MM:SS.FFF'; % defines my time format for the datevec function
formatSpec = '%*s%*s%*s%*s%*s%*s%s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s%[^\n\r]';
fileID = fopen(filename,'r');
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false); %reads my file
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'EmptyValue' ,NaN,'ReturnOnError', false); %creates my data array
fclose(fileID);
Time = dataArray{1, 1}; %outputs the column I want from the .csv file
Time1 = num2str(cell2mat(Time)); % where Time is my input data
TimenoDay = Time1(5:end); % This is supposed to remove the 246 and space leaving me with 14:32:54.342 231
TimenoSpaces = TimenoDay(~isspace(TimenoDay)); % This is supposed to remove all spaces leaving me with   14:32:54.342231
[Y M D H MN S] = datevec(TimenoSpaces,formatIn); this is supposed to break the time into a vector array
SOD = (H*3600+MN*60+S); % this should convert all of my time stamps to Seconds of Day

It works fine till I get to the datevec function and it runs very slow, as in 30 minutes+. Now, it is ~ 5 million rows of data, but I did something similar yesterday and it only took a few minutes to run the datevec function with a similar amount of data. The only difference between the two scripts is that the one that want quick had a normal time format and I could import it as a Time format (14:42:36.243213) instead of as text with the script above. I think my problem is because of the Julian day and the extra spaces in the timestamp.

Does anyone have any ideas that could help me speed this up? Make my code better?

Edit#1: So I see why the datevec isn't working. when I run

Time1 = num2str(cell2mat(Time)

it returns a 20 character cell with all 2's in it. But if I do

Time1 = num2str(cell2mat(Time(1:1)))

and just look at the first cell of data it works fine.

2 Comments
Show None Hide None

dpb on 9 Dec 2015

What is the format of the input file? If all you want is the time data I'd just read the date data directly and convert it saving a step. Precise code would depend on the format...

joshmartinmont on 9 Dec 2015

It's a .csv file with 5,272,494 rows and spans columns A - AQ. The only column I care about (for this check anyway) is column G which gives me a timestamp of DDD HH:MM:SS.FFF sss (milliseconds I believe). All I need for the check I'm performing is the HH:MM:SS.FF sss, without the space. So I want my end format to look like HH:MM:SS.FFFsss. I need it in a string that I can then perform the datevec function on so I can separate it by H M SS to then convert to Seconds of Day. This will allow me to look for gaps in my data greater than 90 milliseconds.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 9 Dec 2015

Edited: dpb on 10 Dec 2015

Open in MATLAB Online

0 votes

Again, pasting a (partial even) row of text with the actual field of interest would relieve confusion regarding the form, but presuming the 'sss' is actually microseconds (since the .FFF includes down to the msec resolution) and is in the file, then I'd do something like

fmt=[repmat('%*s',1,'G'-'A') '%3f %2f:%2f:%f %*[^\n]'];
d=cell2mat(textscan(fid,fmt,'delimiter',','headerlines',1,'collectoutput',1));

Now you'll have have an array Nx4 that are the [day,hr,min,sec.fff] values and you can pass those directly to datevec with the leading columns yr,mo as zero or some arbitrary start date (it'll all get subtracted out in the end so is immaterial as to absolute value and the time fields are all stored in the fractional portion of the datenum value. Keeping the day will eliminate issues regarding clock rollover if any data were to have been collected over the bewitching midnight hour.

Then, simply looking for

diff(datenum([YR0 1 d])>Setpoint)

will give you the desired answer. The above should be pretty quick.

Again, all above is air-code w/o an example subset of data to 'spearmint with..."salt to suit" :)

ADDENDUM

OK, with the sample data file to check on formatting, the rest is pretty much as outlined. You don't have any calendar data other than DOY and Matlab datenum needs a reference year so just pick something arbitrary as you're only looking for differences so it all gets subtracted out again, anyway.

dn=datenum(2000,0,dat(:,1),dat(:,2),dat(:,3),dat(:,4)+dat(:,5)/1E6);
dt=diff(dn)*86400*1000;

Above takes the date information you have and creates a datenum sequence. As noted before, this will have the benefit of handling clock rollover, etc., automagically. Then the "delta-t" dt is the sample time differential in msec which you can test. For a sample subset of the file supplied, this looks like--

>> dt(10:40)
ans =
0905
4526
7343
6940
6840
6940
6940
6940
2494
3722
3722
6940
5532
6840
1308
3017
6940
5834
2112
1308
4124
8058
3722
3722
6940
5532
6840
1408
2917
1509
3923
>>

Seems like the acquisition must be on a time-sharing system where there's a periodic latency, maybe???

Anyway, your end object is now simply to find the locations that don't satisfy your criterion. There are several choices for how to report this; you can use a logical vector of locations that satisfy, find locations that don't, report if any don't or all do, etc., etc., etc., ...

>> isOK=all(dt<90)
isOK =
   1
>> sum(dt>10)
ans =
   4
>> find(dt>90)
ans =
 Empty matrix: 0-by-1
>> find(dt>10)
ans =
  18
  31
  53
  61
>>

Above uses a couple different fixed setpoints; you'll make that a variable, of course. I used an example of 10 just to illustrate the case of a few failing. The logical-addressing logic vector solution returns a full-length logical array of T/F, of course. Which turns out easiest to use is all dependent upon end objectives.

16 Comments
Show 14 older comments Hide 14 older comments

joshmartinmont on 10 Dec 2015

Edited: joshmartinmont on 10 Dec 2015

Open in MATLAB Online

Sample Input File.csv

/bangs head slowly on desk.

Yes. It's supposed to be a .csv file. When I copied the snippet over and saved it, guess what the default format is on my computer.... Yeah, .xlsx. Oh well. I'll stop trying to be fancy and do it the way you've asked repeatedly.

Bus Name   Bus ID   CMD Type   RT   Bus A/B   Sub Addr   Time   CMD-1   CMD-2   Tx Status   Rcv Status   Word 1   Word 2   Word 3   Word 4   Word 5   Word 6   Word 7   Word 8   Word 9   Word 10   Word 11   Word 12   Word 13   Word 14   Word 15   Word 16   Word 17   Word 18   Word 19   Word 20   Word 21   Word 22   Word 23   Word 24   Word 25   Word 26   Word 27   Word 28   Word 29   Word 30   Word 31   Word 32
1553  18  RT to BC  10  B  1  246 18:27:08.152 160  5420    5000    fcff  26  b93e  ff83  a  20  d5  187  43  fff6  ff9b  fc0d  da01  1b37  dcf2  c991  50a  198  ffff  28c4  ffff  ea45  0  9a5  3ff  3977  b9  c16  dd9  5717  6cde  95c1

That is the first line of the input file and it truly is a .csv file.

filename = 'D:\MatLab Files\Import Files\mydatafile.csv';
delimiter = ',';
startRow = 2;
formatSpec = [repmat('%*s',1,'G'-'A') '%3f %2f:%2f:%f %*[^\n]'];
fileID = fopen(filename,'r');
dataArray = cell2mat(textscan(fileID, formatSpec, 'Delimiter', delimiter, 'EmptyValue' ,NaN,'HeaderLines' ,startRow-1, 'ReturnOnError', false));
fclose(fileID);

And that is the code I'm using to import the file.

I hope this helps. And thank you for taking the time to walk through this with me. It's much appreciated.

And just in case it works this time as well, I've uploaded the snippet as the .csv file it should have been.

dpb on 10 Dec 2015

Open in MATLAB Online

OK, that works...I see a couple things. First, use 'collectoutput',true to return the data array as an array. Secondly, looks like you do need the remaining precision of the time field that I had ignored thinking you had enough precision in the first seconds and fractional thereof. That modifies the format to

fmt=[repmat('%*s',1,'G'-'A') '%3f %2f:%2f:%f %f %*[^\n]'];

which is simply adding one more scanned field before skipping the rest of the line. With that and the aforementioned change, I get

>> dat=cell2mat(textscan(fid, fmt,'Delimiter',',', ...
                  'HeaderLines',1, 'collectoutput',1));
>> whos dat
Name       Size            Bytes  Class     Attributes
dat       67x5              2680  double              
>> 
>> dat
dat =
246.0000   18.0000   27.0000    8.1520  160.0000
246.0000   18.0000   27.0000    8.1520  852.0000
246.0000   18.0000   27.0000    8.1520  984.0000
246.0000   18.0000   27.0000    8.1530  278.0000
...
246.0000   18.0000   27.0000    8.2300  961.0000
246.0000   18.0000   27.0000    8.2310  652.0000
246.0000   18.0000   27.0000    8.2320  204.0000
246.0000   18.0000   27.0000    8.2320  895.0000
>> num2str(dat(end,4)+dat(end,5)/1E6)
ans =
8.232895
>>

So, reading the data is easy; what again now is it that you're after, precisely?

joshmartinmont on 11 Dec 2015

Open in MATLAB Online

Two problems I'm running into.

1) My dataArray output is giving me the 5 columns I expect to see, however the issue is when a value for the seconds is a whole number, i.e. 38 vs. 37.999.

0 0 37.9640 646 ....
0 0 37.9990 982 ...
0 0 38      493

This is causing a huge jump in my differences as it appears that when it combines the seconds/milliseconds to the microseconds, it makes it look like

0:37.9640646
0:37.990982
0:38493

What I think I need the formatspec to do for me is to always give me the 3rd column out to 3 digits past the decimal, not four, even if that means I see

38.000

Making the entire row look like

246 0 0 38.000 493

for a time of

0:0:38.000493

I tried doing

formatspec = [repmat('%*s',1,'G'-'A') '%3f %2f:%2f%.3f %f %*[^\n]']

but it only ended up truncating the last column of 3 digits down to 1 digit and left the 4th column with 2.xxxx.

2) I forgot about the time it takes for the data to sync to the GPS clock meaning I have a few thousand rows with a time of zero hours that I need to have the script dump out. The closest I can get is

>>Time(strncmp(Time,'00',2))=[]

but that would only work if it was a string and the first 2 characters were 00. Any ideas on how to do something similar for the data matrix Nx5?

joshmartinmont on 11 Dec 2015

/slowly bangs head into desk over and over

I'm confusing my gaps. Sigh. yes, it is doing it right. I had to step away and stop being frustrated. Let me finish running my entire script and compare the output to the output of the tool I'm replacing and I think all shall be good.

thanks for putting up with my ramblings this morning.

dpb on 12 Dec 2015

Open in MATLAB Online

Ayup, that's what the dataset you posted shows, anyway. There's one of about 16 msec is the largest in that particular segment of a file (which is, of course, relatively short so don't know what would be max(dt) over a longer acquisition period).

Indeed the 1E6 and 86400*1000 constants are correct for the data file as given; datenums are in integer and fractional days by the implementation.

>> dat(1:4,:)
ans =
246.0000   18.0000   27.0000    8.1520  160.0000
246.0000   18.0000   27.0000    8.1520  852.0000
246.0000   18.0000   27.0000    8.1520  984.0000
246.0000   18.0000   27.0000    8.1530  278.0000
>> num2str(dat(1:4,4)+dat(1:4,5)/1E6,'%.8f')
ans =
8.15216000
8.15285200
8.15298400
8.15327800

shows the precise seconds to the microsecond as given in the input data.

While as you note, datestr won't show more than the msec resolution, you can prove there's more resolution within the date numbers pretty simply--

>> [~,~,~,~,~,s]=datevec(dn(1:4)); % get the seconds individually

Default precision displayed is four digits, but

>> num2str(s,'%.6f')
ans =
8.152199
8.152901
8.153000
8.153297
>>

Ewww...this isn't quite the same--let's see the the rounding error magnitude expected...

>> eps(dn(1))/(1E-6/86400)
ans =
 10.0583
>> abs((ss(1)-dv(1,6)))*1E6
ans =
 38.7915
>>

The first is the double-precision eps() for a date number at this particular time range divided by the magnitude of 1 usec in fractional days. This shows roughly 10 usec is the theoretical limit for a double precision variable. So, the resulting discrepancy of about 40 usec returned is the order of magnitude that is the limit of what the implementation can provide. For the sample data, the statistics are

>> er=abs((ss-dv(:,6)))*1E6;  % difference between input/datenum, usec
>> [max(er) min(er) mean(er)]   % range and average difference
ans =
 53.8687    1.7163   25.2802
>>

But, given the requirement for checking within msec resolution, this is well within it and has the convenience behind it of dealing with whatever rollover there may be in any particular time sequence irregardless of time of day or even year rollover if the data acq is a continuous process.

If you were to need the full microsecond resolution, then you would have to do the conversion of the time fields independently to get the full double precision within range.

Sign in to comment.

Time converted to a str

2 Comments
Show None Hide None

Accepted Answer

16 Comments
Show 14 older comments Hide 14 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Time converted to a str

2 Comments Show None Hide None

Accepted Answer

16 Comments Show 14 older comments Hide 14 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

16 Comments
Show 14 older comments Hide 14 older comments