Matlab readmatrix inconsistently reading csv files
35 views (last 30 days)
Show older comments
I'm using matlabs readmatrix function to read in data from a csv file and store to a variable. The csv files are identical in format, with a bunch of lines of text at the start before the data starts at line 21. However, the readmatrix function seems to behave inconsistently, sometimes capturing all the text at the start of the csv and storing as NaN, and other times ignoring these first 21 lines and only grabbing the data. Why is this? What is a better way to do this?
7 Comments
Stephen23
on 24 Aug 2023
Edited: Stephen23
on 24 Aug 2023
"I have just opened my csv files in a text editor. Whilst the headers look identical in Excel, in the text editor there are a number of comma delimiters after most lines on one of the files. Perhaps this explains the different behaviour."
Yes, differences between the files is most likely the cause.
Of course the algorithm used by READTABLE et al is not perfect (there is no such thing) and it cannot read minds: what is obevious to a human is not obvious to a machine. It is always possible to trick or confuse an algorithm with the right combination of data or whatever, such things are mathematically unavoidable.
Note that relying on what files "look like" in MS Excel is a number one mistake that you should avoid: MS Excel mangles data in all sorts of horrible ways that look indistinguishable from inside Excel, e.g. adding or changing dlimiters. It can also change data without any warning:
If you want reliable data processing do NOT open and save text files using MS Excel. It is a great tool for Excel spreadsheets... but for anything else... beware of dragons!
Accepted Answer
Steven Lord
on 24 Aug 2023
If you know exactly how many header lines your file contains, I would specify the NumHeaderLines name-value argument in your readmatrix call.
Alternately you can create a file import options object using detectImportOptions. Once it's been created check that its properties that specify where the data is located (either DataRange or DataLines) and where any variable metadata is located (VariableNamesLine, VariableDescriptionsLine, VariableUnitsLine, or the corresponding Range properties for SpreadsheetImportOptions) match your expectations for where the data / metadata is located based on the expected format of the files. Once you've confirmed that they match your expectations, pass that import options object into readmatrix as the opts input argument.
If the import options properties don't match what you expect, and reviewing the file doesn't indicate to you why MATLAB is detecting the values for those properties that it is, please send a sample data file that demonstrates this behavior to Technical Support using this link along with the import options object and describe the results you expect. It's possible that you've identified a bug or an ambiguous edge case in the import options detection algorithm.
0 Comments
More Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!