I'm thinking of a function that reads first 17 rows, ignores the first one, then only exports my rows and columns in interest into a 11x9x365 matrix, then reads the next 17 etc... till the end of the file. But I'm not enirely sure how to go about it.
Importing .dat data and creating a matrix
78 views (last 30 days)
Show older comments
Hello
I got slightly more than one hundred .dat files. I'm attaching one of them here (converted into .txt because it didn't allow me to upload .dat). I'm trying to import only one at a time at this moment.
Generally the format is a row of date: yyyy mm dd and below are rows and columns of data. What I'm trying to do is import that data and limit it by four points - creating a matrix - as follows:
(11,5).....(21,5)
(11,13)....(21,13)
I want to create this matrix x 365 days - so import the data and assign it to its date somehow. I've tried the built in Import Tool and the readtable function but I can't get it to work for me at all. Does someone know a good way to do that?
Thanks
3 Comments
Star Strider
on 15 Aug 2023
To use readtable with a ‘.dat’ file, use the name-value pair 'FileType','text' listed under Text Files (no direct link to it), since they appear to be text files.
To upload one or more of them here (as .dat files), use the zip function and then upload the .zip file.
Accepted Answer
C B
on 15 Aug 2023
Edited: C B
on 15 Aug 2023
% Constants
START_ROW_OFFSET = 5;
END_ROW_OFFSET = 13;
START_COL = 11;
END_COL = 21;
% Read data
data = readmatrix('data.dat.txt');
% Generate list of row numbers for the dates
date_rows = 1:17:size(data, 1);
% Initialize a structure array
data_struct = struct('date', [], 'data', []);
for i = 1:length(date_rows)
% Extract the year, month, and day
year = data(date_rows(i), 2);
month = data(date_rows(i), 4);
day = data(date_rows(i), 6);
% Format the date into a string
date_str = sprintf('%d-%02d-%02d', year, month, day);
% Extract data using the defined constants
data_values = data(date_rows(i) + START_ROW_OFFSET - 1 : date_rows(i) + END_ROW_OFFSET - 1, START_COL:END_COL);
% Store in the structure
data_struct(i).date = date_str;
data_struct(i).data = data_values;
end
for i = 1:3
disp(['Data for Date ' data_struct(i).date]);
disp(data_struct(i).data);
disp('------------------------');
end
2 Comments
dpb
on 15 Aug 2023
year = data(date_rows(i), 2);
month = data(date_rows(i), 4);
day = data(date_rows(i), 6);
% Format the date into a string
date_str = sprintf('%d-%02d-%02d', year, month, day);
NOTA BENE: the above has the same assumption I made initially and overlooked didn't get right answer when got past a single-digit month. See my later follow-up for fixup code.
Also, once you do have the correct numeric values, there's no reason to convert to text to then convert back again, use the (y,m,d) vector form of input to datetime instead.
More Answers (1)
dpb
on 15 Aug 2023
Edited: dpb
on 15 Aug 2023
"...that data and limit it by four points - creating a matrix - as follows:
(11,5).....(21,5)
(11,13)....(21,13)"
I don't follow what the above is intended to represent? You mean you only want to keep four elements out of each day's worth of data given by those four indices or the array data(11:21,5:13) from each?
Either is relatively trivial, just need to know what, specifically, is intended.
Also, the file shows up with an extra linefeed or two in the first rows, it appears at least in my browser; the day of y,m,d is split at only one character on first record. One presumes/hopes that isn't real...
data=readlines('data.dat.txt'); % see what content is as string
data(1:5,:)
Ah, so, looks to be tab-delimited...
numel(double(data{1})==9)
numel(double(data{2})==9)
But, they're not same number each...readmatrix may not work as desired, let's see about that...
data=readmatrix('data.dat.txt');
whos data
data(1:5,1:7), data(1:5, end-4:end)
data(end-4:end,1:10)
Well, that does seem able to handle, let's see about finding the dates using that first tab---
ixDate=isnan(data(:,1)); % logical vector records starting with a NaN
nnz(ixDate) % how many are there?
Aha! The number we would have expected for each day of a non-leapyear year...that's most excellent!
We can then get the dates easily enough; one presumes the data size is consistent for each...but let's check that out...
dataSize=diff(find(ixDate)); % the distance between date records
N=unique(dataSize) % easy way to see if all the same and what is
And, they are all same size, with 17 data lines between...but that isn't commensurate with what would appear to be 21 rows requested above?
But, it's simple enough then to reshape the file however wanted...
dates=datetime(data(ixDate,[2 4 6])); % convert y,m, d to datetime
dates([1:3 end-2:end]) % and see if got it right
Looks ok; goes from first to last day of year.
ERRATUM: LOOK MORE CLOSELY, THE LAST THREE AREN'T RIGHT!!! There's a fuller explanation at bottom, the quick fixup corrections are..
dateData=data(ixDate,[2 4:6]); % columns of y, m, 10s,1s day
dateData(isnan(dateData))=0; % fixup the initial 10s day column
dates=datetime(dateData(:,1),dateData(:,2),10*dateData(:,3)+dateData(:,4)); % fix day and convert
dates([1:3 end-2:end])
OK, so now that does look more better...@OcDrive, you'll need to verify this behavior in the real file(s); would be a good thing to fix at the source if possible.
So, now, get rid of the date records and convert to 3D by the number records/day...
data=data(~ixDate,:); % remove the dates (keep not date)
whos data
data=mat2cell(data,repmat(N-1,1,numel(dates)),size(data,2)); % by records, width to a cell array
whos data
data=cat(3,data{:});
whos data
3 Comments
dpb
on 15 Aug 2023
I couldn't interpret that desire so left as "exercise for Student" to select whatever range is desired...that would be some sort of colon indexing operation.
Your last presumption would simply be
r1= 5; r2=13;
c1=11; c2=21;
data=data(r1:r2,c1:c2,:);
presuming the wish is intended as inclusive.
dpb
on 15 Aug 2023
Edited: dpb
on 15 Aug 2023
Actually, the dates above aren't correct for the end...looks like maybe there isn't always a blank column there after some point? Mayhaps have to do something more there...
data=readmatrix('data.dat.txt'); % see what content is as string
ixDate=isnan(data(:,1)); % logical vector records starting with a NaN
nnz(ixDate) % how many are there?
dateData=data(ixDate,:); % extract those records to inspect more carefully
dateData([1:5 end-4:end],:)
Yeah, well that sucks...the format does change somewhere from beginning to end.
Oh! Actually, the day is in two fields, the original NaN should be zeros in column 5 and then column 5:6 should be interpreted as one instead. Not sure if that is real or a figment of having uploaded the file; OP will have to determine what's going on there for sure.
To fix this as it was interpreted would be something like
dateData=data(ixDate,[2 4:6]); % pick out the date data columns
dateData(isnan(dateData))=0; % convert the initial nan-->0
dateData(:,3)=10*dateData(:,3)+dateData(:,4); % fixup the day number from the two columns
dates=datetime(dateData(:,1:3)); % convert y,m, d to datetime
dates([1:3 end-2:end]) % and see if got it right
See Also
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!