import and sorting text file to matlab

hi all.
I have these hundreds text files with some data i should analyze
each file has date column time column, ID column and value column.
i would like to create a structure that contains text file name and ID as fields and then fill it with the respectvie times and values. i wpould like to create a general code that can be used in every future situation, because i'm going to get more and more of this text files to analyze.

7 Comments

Can you attach an example text file?
Is there only one ID per file? It's not clear what output you want. Can you show an example of what you want (preferably using valid matlab syntax so it's not ambiguous).
the text files look like this, with unsorted IDs (there are almost 40 different IDs).
i would like to sort it first by ID and then by time.
in addiction, an ID has different value size from the same ID in another file.
"an ID has different value size from the same ID in another file."
You mean the number of rows varies from file to file? Or something else?
I'm still unsure of what you want. You want to create just one output for all the files, where all the IDs and times are all in one array with an extra column for the file, then sort by ID and time? Or have one array per file, each one sorted?
From a parsing point of view, it would be much easier if the date was stored as 2019-02-19 (or 2019/02/19, or anything but spaces between the number). Is that an option?
so, each file has the same number of ID, each ID has different values ( so different rows).
each text file is basically an aging cycle.
I would have an output like this:
structure with 2 fields, let's say A and B
each field must have field containing the several aging cycles.
each cycle must have every ID sorted in the same way so I can plot everything easy and perform the analysis.
in the end, i don't mind about the date, i need only time to plot cycles. also time values has to start from zero.
What does "each ID has different rows" mean?
In your example file, each ID, except ID 0, is only present on one or two rows. ID 0 must have a special meaning since it's repeated so often.
Importing your file is trivial
t = mergevars(readtable('example.txt'), 1:3);
t.Properties.VariableNames = {'Date', 'Time', 'ID', 'something'};
t.Date = datetime(t.Date)
Sorting by ID and time is also trivial:
sortrows(t, {'ID', 'Time'})
I'm really struggling to understand what needs to be done afterwards.
i mean, for example, in file x.txt ID 15 has 150 values in x1.txt has 200 values. each file is different and I have hundreds file.
ID 0 it's repeated because it is the beginning of the file, i didn' copy everything.
by the way the average rows number for each file is 13000.
your code works well,
the goal is to have a general code that:
to import all the txt file in once
to put it in a structure
to have each ID in a subfield so i can plot each ID in one graph

Sign in to comment.

 Accepted Answer

  • Import all the text files
folder = 'c:\somewhere\somefolder';
filelist = {'fileA', 'fileB', ...}; %obtained however you want. e.g with dir
alldata = cell(size(filelist));
for fileidx = 1:numel(filelist)
%load file
filedata = readtable(fullfile(folder, filelist{fileidx});
%optionally, tidy up the table:
filedata = mergevars(filedata, 1:3);
filedata.Properties.VariableNames = {'Date', 'Time', 'ID', 'something'};
filedata.Date = datetime(filedata.Date);
%store in cell array
alldata{fileidx} = filedata;
end
%once everything is loaded, merge it all in one table
alldata = vertcat(alldata{:});
%sort by time
alldata = sortrows(alldata, 'Time');
  • put in a structure. Not needed if all you want to plot per ID
  • plot each ID in one graph (I assume it's column 4 (which I called something) against time).
%create figure, axes, and tell matlab to plot all on the same graph
figure;
axes;
hold on;
%plot something vs time for each unique ID.
rowfun(@plot, alldata, 'GroupingVariables', 'ID', 'InputVariables', {'Time', 'something'});

14 Comments

i'm trying but i'm getting this:
Error using fullfile (line 67)
All inputs must be strings, character vectors, or cell arrays of character vectors.
Error in Untitled2 (line 7)
filedata = readtable(fullfile(folder, filelist{fileidx}));
I have no idea how you generated the file list. Clearly, it's not a cell array of char vectors. Either that or folder is not a char vector.
nicolala's comment mistakenly posted as an answer moved here:
good morning, i fixed that issue.
folder = 'C:\Users\H344470\Documents\MATLAB\DATA\Ambient\log_data';
filelist = dir('*.txt');
alldata = cell(size(filelist));
for fileidx = 1:numel(filelist)
%load file
filedata = readtable(fullfile(folder, filelist(fileidx).name));
%optionally, tidy up the table:
filedata = mergevars(filedata, 1:3);
filedata.Properties.VariableNames = {'Date', 'Time', 'ID', 'Values'};
filedata.Date = datetime(filedata.Date);
%store in cell array
alldata{fileidx} = filedata;
end
for K=1:numel(filelist)
alldata{K,1} = sortrows(alldata{K,1}, {'ID', 'Time'});
end
now I got alldata (Nx1 cell). each N is a cycle with every ID inside.
how I can plot N cycles in one graph for each ID?
e.g. ID 1 --> one plot with N cicles against time.
You don't need the sortrows loop. As I've written in my answer, after you've loaded all the files, all you need to do is:
%once everything is loaded, merge it all in one table
alldata = vertcat(alldata{:});
%sort by time
alldata = sortrows(alldata, 'Time');
%create figure, axes, and tell matlab to plot all on the same graph
figure;
axes;
hold on;
%plot values vs time for each unique ID.
rowfun(@plot, alldata, 'GroupingVariables', 'ID', 'InputVariables', {'Time', 'Values'});
This will create a plot of Values vs Time for each ID.
If you need a more complex plot, then create a plotting function for a single ID that takes two inputs: a column vector of times, and a column vector of Values. and pass that function as the first input of rowfun.
hi guillaume. since I need a structure becaouse i have to different kind of cycle (e.g. cold temperature, Ambient temperature) i created this function:
function alldata = GetData (folder)
filelist = dir(fullfile(folder,'*.txt'));
alldata = cell(size(filelist));
for fileidx = 1:numel(filelist)
%load file
filedata = readtable(fullfile(folder, filelist(fileidx).name));
%optionally, tidy up the table:
filedata = mergevars(filedata, 1:3);
filedata.Properties.VariableNames = {'Date', 'Time', 'ID', 'Values'};
filedata.Date = datetime(filedata.Date);
%store in cell array
alldata{fileidx} = filedata;
end
for K=1:numel(filelist)
alldata{K,1} = sortrows(alldata{K,1}, {'ID', 'Time'});
end
end
and
Ambient = input('insert ambient folder path: ');
alldata.A = GetData(Ambient);
Cold = input('insert cold folder path: ');
alldata.C = GetData(Cold);
now I got this alldata structure.
the next step should be:
since the time is not the same for IDs and for testing cycles, i would like to subtract for every ID its first time (e.g. for ID(i) t=t-t(1)), so i can have every ID measurement starting from zero.
after that i have two option:
or reshape the structure to group IDs or plot same ID in single plot.
I did some hardcoding like this, but I need it general for future purposes;
figure(16) %ID 16
subplot(2,1,1)
title('ambient')
grid on
hold on
plot(Ta(3,1).t,Ya(3,1).y)
plot(Ta(3,3).t,Ya(3,3).y)
plot(Ta(3,4).t,Ya(3,4).y)
plot(Ta(2,5).t,Ya(2,5).y)
plot(Ta(2,8).t,Ya(2,8).y)
plot(Ta(3,7).t,Ya(3,7).y)
subplot(2,1,2)
title('cold')
grid on
hold on
plot(Tc(2,4).t,Yc(2,4).y)
plot(Tc(2,20).t,Yc(2,20).y)
for i=16:17
plot(Tc(2,i).t,Yc(2,i).y)
end
for i=1:3
plot(Tc(3,i).t,Yc(3,i).y)
end
for i=5:15
plot(Tc(3,i).t,Yc(3,i).y)
end
for i=18:19
plot(Tc(3,i).t,Yc(3,i).y)
end
I had to select ID rows because in some text file some IDs are missing.
sorry if i bother you, but i'm a kindf of beginner.
Matlab has several functions that allow you to apply the same code onto groups of data (in your case, IDs), so it is much simpler if you keep your data as one or two big flat table. You could either have two tables, one for ambient and one for cold or just one table, with an additional column which tells you whether the row is for cold or ambient condition. If you're going to generate the same plots for both condition, it may make it easier.
In any case, I'd modify your GetData function to return just the one table:
function alldata = GetData(folder)
%Import all text files within a folder as one huge table with 4 variables.
%input: folder, a char vector or scalar string
%alldata: table with 4 variables: Date, Time, ID, Values
filelist = dir(fullfile(folder,'*.txt'));
alldata = cell(size(filelist));
for fileidx = 1:numel(filelist)
%load file
filedata = readtable(fullfile(folder, filelist(fileidx).name));
%optionally, tidy up the table:
filedata = mergevars(filedata, 1:3);
filedata.Properties.VariableNames = {'Date', 'Time', 'ID', 'Values'};
filedata.Date = datetime(filedata.Date);
%store in cell array
alldata{fileidx} = filedata;
end
alldata = vertcat(alldata{:});
end
You can then import your files. I'd use uigetdir instead of input (which is missing the 's' option in your code):
coldfolder = uigetdir(pwd, 'Select cold folder path');
assert(isnumeric(coldfolder), 'No folder selected. Aborting');
colddata = GetData(coldfolder);
ambientfolder = uigetdir(pwd, 'Select ambient folder path');
assert(isnumeric(ambientfolder), 'No folder selected. Aborting');
ambientdata = GetData(ambientfolder);
Don't bother using a structure for that. And use meaningful names for your variables/fields.
Also, don't bother with the sorting. It's not necessary (unless you find it easier to look at the data).
As I said, you're better off merging the above two tables and add an extra column:
fulltable = [colddata; ambientdata];
fulltable.Condition = repelem(categorical({'Cold'; 'Ambient'}), [height(colddata); height(ambientdata)]);
Then, you create a function to do the plotting for a single ID and both conditions. For example:
function cycleplot(ID, condition, time, values)
%plot a cycle for a given ID and condition
%ID: due to the way rowfun works, this will be a column vector of identical IDs
%condition: a column vector of conditions (categorical)
%time: column vector of times (duration)
%values: column vector (numeric)
figure('Name', sprintf('ID: %d', ID(1)))
condlist = {'ambient', 'cold'};
for cond = 1:2
subplot(2, 1, cond);
title(condlist{cond});
iscond = condition == condlist(cond);
condtime = time(iscond);
condvalues = values(iscond);
[condtime, order] = sort(condtime); %sort by time
convalues = values(order); %and use the same order for values
condtime = condtime - condtime(1); %start time at 0
plot(condtime, condvalues)
end
title(sprintf('ID: %s at %s condition'), ID{1}, condition(1));
[time, order] = sort(time); %sort by time
values = values(order); %and reorder values the same way
time = time - time(1); %start time at 0
plot(time, values);
grid on;
end
Then you call that function for each ID with rowfun:
rowfun(@cycleplot, fulltable, 'GroupingVariables', 'ID', 'InputVariables', {'ID', 'Condition', 'Time', 'Values'}, 'NumOutput', 0);
Typos may have been made, Answers is not behaving.
thank you so much, i learnt more these days than in my uni. :)
nicolala's comment moved here: (Please make sure to use comments, instead of starting ta new answer):
there's something wrong in the cycleplot function I think.
Error using tabular/rowfun>dfltErrHandler (line 514)
Applying the function 'cycleplot' to the 1st group of rows in A generated the following error:
Brace indexing is not supported for variables of this type.
.
Error in tabular/rowfun>@(s,varargin)dfltErrHandler(grouped,funName,s,varargin{:}) (line 262)
errHandler = @(s,varargin) dfltErrHandler(grouped,funName,s,varargin{:});
Error in tabular/rowfun (line 284)
errHandler(struct('identifier',ME.identifier, 'message',ME.message, 'index',igrp),inArgs{:});
Error in Data_in (line 17)
rowfun(@cycleplot, fulltable, 'GroupingVariables', 'ID', 'InputVariables', {'ID', 'Condition', 'Time', 'Values'}, 'NumOutput', 0)
It's probably the title line: It should have been:
title(sprintf('ID: %s at %s condition'), ID(1), condition(1));
with () brackets for ID instead of {}.
As I said, there may have been typos. There's a bug in the forum where it sometimes scroll back up to the top every time you type a character. That kept happening, so I couldn't see what I was typing.
Note: in case you need to debug what is happening in cycleplot, issue
dbstop if caught error
at the command line. Matlab will break into the debugger at the line causing the problem instead of catching the error.
Afterward, to go back to normal operation:
dbclear if caught error
I got this
Capture1.PNG
while I'd need this.
Capture.PNG
btw, the code is very efficient. the first one I wrote it took 5 minute, this one just 30 sec.
time axes 70 sec, this is what i meant when io sad starting time from zero.
The axis is a duration axis, you can easily customise it to display the duration in the format you want.
The current code is set to create one plot, value vs time whereas it looks like that should be split into several plots. I'm not sure what the criteria is for that.
yes, instead of create one single plot, split it into several plots, each of them representing one cycle. so when I sad time starting from zero it meant every cycle has to start from zero so I can compare them and I can split the plots because cycles has the same time axis.
I hope the picture clear your doubt.
I'm unclear how a cycle is defined. In any case, you probably need to add another variable to the table which would define which cycle the row belongs to. Then, in the plot function, you can split the data per cycle (using e.g. splitapply).
each text file contains 1 cycle for 36 IDs. that's all
e.g.
DATE ID VALUE
14:01:00.000 6 0.24
..... 6 .......
14:02:10.000 6 0.29
this is a cycle present in one text file. it repeated for the others IDs and time values also are different, the only thing inb common is the duration of 70 seconds.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 13 Mar 2019

Edited:

on 18 Mar 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!