Datenum - different date formats
4 views (last 30 days)
Show older comments
Background
I have an excel sheet with different date formats (in excel some are in General format, some in date format totally random)
example: 10/05/2005 , 10/5/2005, 10/05/05, 05/12/2005 and so on.
I read the XL file into MATLAB as follows:
[dta_num , dta_txt , dta_raw] = xlsread(...)
I need to work with the dates in the excel sheet subsequently.
Actual Problem
I tried out the following commands
- datenum('10/5/2005')
- datenum('10/05/2005')
- datenum('10/05/05')
- datenum(dta_txt(2,2))
All of them return the same answer. (732590) But I realized that 732590 is read as 05-Oct-2005 when I ran
datestr(732590)
However the dates in my data sheet needs to be read as dd/mm i.e 10-May-2005.
If I choose to specify the format,
- * datenum('10/5/2005','dd/mm/yyyy') ans = 732442
- * datenum('10/05/2005','dd/mm/yyyy') ans = 732442
- * datenum('10/05/05','dd/mm/yyyy') ans = 1957
Originally I thought I will convert the dates using datenum and the after my manipulations using functions such as month(datenum(dta_txt(2,2),'dd/mm/yyyy')) I will use datestr and I will have all the dates in the same format. But now I am back to square 1 since datestr(1957) gives me a 2 digit year.
Also I want to make sure that date formats are not mixed up i.e dd/mm/yyyy and mm/dd/yyyy are mixed up. For this I plan to go over the date and see if the previous date belongs to the same month or one month earlier and no month is greater than 12 assuming a dd/mm format. If it is, then it is to be decoded as mm/dd/yyyy and appropriately changed to dd/mm/yyyy.
Is the only solution is to go over the length of the dta_txt contents using cellfun('length',dta_txt(:,2)) and change it to 4 year format (I guess there are very few in 2 year digit YY format)
To summarize my requirements are;
1. date format which takes less resource and easy to manipulate as I might want to extract and make comparisons of the the month and year later. 2. Have a consistent date format preferably dd/mm/yyyy.
2 Comments
Walter Roberson
on 23 Dec 2012
I closed the other (earlier) post as there had been less commentary on it.
Accepted Answer
Laura Proctor
on 23 Dec 2012
Edited: Laura Proctor
on 23 Dec 2012
Would something like this help?
% Create a cell array containing dates in various formats
y = { '10/5/05' ; '18/2/2004' ; '3/3/2003' }
% Split up the elements in the cell array
[~,~,~,~,~,~,ss] = regexp(y,'/')
% Determine which dates have a two element year
ivals = cellfun(@(x) numel(x{3})==2,ss)
% Preallocate a vector to contain the dates
dates = zeros(size(y))
% Convert those dates with a four element year using datenum
dates(~ivals) = datenum(y(~ivals),'dd/mm/yyyy')
% Convert the dates with a two element year
dates(ivals) = datenum(y(ivals),'dd/mm/yy')
You could also use datevec to pull out the elements separately. Here I have assumed that any two element years that occur are later than 2000. If you also have dates that occur in the 1900s, then a little more logic is required.
% Create three vectors: year, month, and day
[year,month,day,~,~,~] = datevec(y,'dd/mm/yyyy')
% Logical array containing those year values that are less than 2000
lt2000 = year < 2000
% Add 2000 to any year that are less than 2000
year(lt2000) = 2000+year(lt2000)
3 Comments
bym
on 23 Dec 2012
This works for 2 digit years
datenum('10/05/05','dd/mm/yyyy',2000)
ans =
732442
Laura Proctor
on 23 Dec 2012
Edited: Laura Proctor
on 23 Dec 2012
You're right, proecsm - it does work if you just have one date. However, if you have a cell array of dates, it will not return the same results.
Walter posted a great solution in the duplicate post:
dates = datenum(y,'dd/mm/yyyy',2000)
More Answers (1)
per isakson
on 23 Dec 2012
Edited: per isakson
on 23 Dec 2012
There is no way for me (in Sweden) to know how to interpret
'01/02/03'
or
'01/02/2003'
IMO: when in doubt avoid to use the default values. It is better to specify the format an extra time than not to do it when needed.
IMO: the most precious resource is the time of the programmer.
In this case a vectorized code is faster than a loop. Experiment:
>> mysteries_dates_to_sdn
Elapsed time is 1.540982 seconds.
Elapsed time is 0.056104 seconds.
>> mysteries_dates_to_sdn
Elapsed time is 1.508512 seconds.
Elapsed time is 0.055982 seconds.
>>
where
%%mysteries_dates_to_sdn
N = 1e3;
castr = {
'10/05/2005'
'10/5/2005'
'10/05/05'
'05/12/2005'
};
castr = repmat( castr, [N,1] );
tic
sdn_loop = mysteries_dates_to_sdn_loop( castr );
toc
tic
sdn_vec = mysteries_dates_to_sdn_vectorized( castr );
toc
assert( all( sdn_vec == sdn_loop ) ...
, 'mysteries_dates_to_sdn:failure' ...
, 'Failure: sdn_loop differs from sdn_vec' )
and
function sdn = mysteries_dates_to_sdn_loop( castr )
sdn = nan( size( castr ) );
for ii = 1 : numel( castr )
str = castr{ii};
len = length( str );
if len >= 9
sdn(ii) = datenum( str, 'dd/mm/yyyy' );
else
sdn(ii) = datenum( str, 'dd/mm/yy' );
end
end
end
and
function sdn = mysteries_dates_to_sdn_vectorized( castr )
sdn = nan( size( castr ) );
len = cellfun( @length, castr );
islong = ( len >= 9 );
isshort = not( islong );
sdn(islong) = datenum( castr(islong) , 'dd/mm/yyyy' );
sdn(isshort) = datenum( castr(isshort), 'dd/mm/yy' );
end
.
BTW: Why not use datevec rather than datenum to avoid calling the function, month
.
More:
This function (proposed above) is approx. 7% faster than mysteries_dates_to_sdn_vectorized
function sdn = mysteries_dates_to_sdn_pivot( castr )
sdn = datenum( castr, 'dd/mm/yyyy', 2000 );
end
0 Comments
See Also
Categories
Find more on Time Series Objects in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!