Readtable not configuring delimiters correctly
4 views (last 30 days)
Show older comments
Hello,
I have a .csv file which, strangely enough, cannot be read correctly by readtable. Here you go with my code and the csv file.
Thank you in advance!
Andrea
P.S. I tried many versions of delimiters!
filename = 'experiments.csv';
opts = detectImportOptions(filename);
opts.Delimiter = ',';
T = readtable(filename,opts);
3 Comments
Arjun
on 10 Oct 2024
You can try this code below and let me know if it worked for you:
filename = 'experiments.csv';
% Detect import options, skipping the initial lines
opts = detectImportOptions(filename, 'Delimiter', ',');
opts.DataLines = [9, Inf];
% Read the table with only the first column
opts.SelectedVariableNames = opts.VariableNames(1);
T = readtable(filename, opts);
% Split the first column into separate columns
splitData = cellfun(@(x) strsplit(x, ','), T.(1), 'UniformOutput', false);
splitDataMatrix = vertcat(splitData{:});
% Keep only the last three columns
lastThreeColumns = splitDataMatrix(:, end-2:end);
% Clean and convert the string values to double
cleanedColumns = cellfun(@(x) strtrim(x), lastThreeColumns, 'UniformOutput', false); % Trim whitespace
% Remove non-numeric characters
cleanedColumns = regexprep(cleanedColumns, '[^\d.-]', '');
% Convert to double
lastThreeColumnsDouble = cellfun(@str2double, cleanedColumns);
% Display the converted matrix
disp(lastThreeColumnsDouble);
The final values are contained in "lastThreeColumnsDouble".
Stephen23
on 10 Oct 2024
Edited: Stephen23
on 10 Oct 2024
"I have a .csv file which, strangely enough, cannot be read correctly by readtable"
It is not strange at all: there is no heuristic or algorithm in the world that works in every circumstance. Every algorithm is fallible (no matter how much beginners imagine them to be perfect, just like their computers have infinite memory and are infinitely fast).
Did you look at the file in a text editor to check if it is a well formatted CSV file?
Because that file is not a well-formatted CSV file. It is actually a mess. Strictly speaking every line is one field:
txt = fileread('experiments.csv')
Even if we ignore that "feature" there are more such "features": double quotes doubled around numeric values! Ugh. Sorry, but whatever wrote that file is the problem. Sadly, the Do What I Want and Not What I Gave You Toolbox is currently being tested and has not yet been released. Currently what you have is: rubbish in => rubbish out.
No surprises there.
Answers (3)
dpb
on 10 Oct 2024
Edited: dpb
on 10 Oct 2024
filename = 'experiments.csv';
type(filename)
The file is a list of strings; the delimiters are inside the string delimiters.
How was the file created? It isn't a valid CSV file format...well, it is, but only as one-column string.
But,
M=readcell(filename,'NumHeaderLines',9,'Whitespace','"','Delimiter',',');
M(1:5,:)
didn't solve the problem...
str2double(strrep(split(ans,','),'"',''))
will work around it.
Alternatively, for this file you could use the FixedWidthImportOptions object, but that wouldn't always be true unless the file was written as this one.
I'd suggest going back to the source and see if you can get the file created as a valid CSV file; it's ok that the fields are string-delimited, but the whole line should not be contained within the apostrophes; that turns each line into a single string with embedded strings; certainly not what was intended.
It does seem a shortcoming in the 'DelimitedImportOptions' to not have a flag for the 'StringDelimited' type; it is pretty common owing to embedded blanks in string data, but that wouldn't solve this problem and it may get recognized automagically as is for properly constructed files.
0 Comments
Star Strider
on 10 Oct 2024
Edited: Star Strider
on 10 Oct 2024
Try something like this —
% type('experiments.csv')
filename = 'experiments.csv';
Lines = readlines(filename);
FindLine = strfind(Lines, "1,");
CheckLineC = cellfun(@(x)~isempty(x), FindLine, 'Unif',0);
idx = find([CheckLineC{:}]);
FirstLine = extractAfter(Lines(idx), "1,");
VN = compose('%s',split(FirstLine,','));
T = readtable(filename, 'NumHeaderLines',8, 'Delimiter',{',','"'}, 'LeadingDelimitersRule','ignore', 'ConsecutiveDelimitersRule','join');
T.Properties.VariableNames = VN
figure
plot(T.Time, T.Displacement, 'DisplayName',VN{2})
hold on
plot(T.Time, T.Force, 'DisplayName',VN{3})
grid
xlabel(VN{1})
legend('Location','best')
This is not quite as efficient as I would like it to be, however it has the virtue of working.
EDIT — Added plot
.
0 Comments
dpb
on 10 Oct 2024
Edited: dpb
on 10 Oct 2024
If it were me and I couldn't fix the original source that created files, I'd probably fix these first, then read a cleaned version instead...
filename = 'experiments.csv';
F=readlines(filename);
ix=find(startsWith(F,"1,"),1); % make sure is first of line; should be, but...
F=F(ix:end); % remove the nondata header rows
F(1)=strrep(F(1),'1,',','); % leave only the leading delimiter of first row
F=strrep(F,'"',''); % strip the quotes--they're unneeded and get in the way
F=arrayfun(@(s)string(s{:}(2:end)),F); % as is the leading delimiter
writelines(F,filename) % write a cleaned up version instead
If run the above on the file(s) first, then all you'll have to do later is read them directly...
T=readtable(filename,'ReadVariableNames',1,'VariableNamesLine',1,'VariableUnitsLine',2);
head(T)
%T.Properties
subplot(2,1,1)
plot(T.Time,T.Force)
xlabel(join([T.Properties.VariableNames(1),T.Properties.VariableUnits(1)]))
ylabel(join([T.Properties.VariableNames(3),T.Properties.VariableUnits(3)]))
subplot(2,1,2)
plot(T.Time,T.Displacement)
xlabel(join([T.Properties.VariableNames(1),T.Properties.VariableUnits(1)]))
ylabel(join([T.Properties.VariableNames(2),T.Properties.VariableUnits(2)]))
The total time may be a little longer by running them through the filter first, but it will make life much simpler from there on out...
1 Comment
dpb
on 10 Oct 2024
A thought strikes that there were two header sections in the file and the first (and only in this file) set of data had the preceding "1," on its variable names header row.
The code both @Star Strider and I show relies on specifically finding that "1" -- if the actual file structure is such that the data for both of those tests is in the same file, then to write totally generic code one would need to scan those header lines for the number of "Results" lines and then loop over those finding each test segment in turn...
See Also
Categories
Find more on Text Files in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!