Difference between readmatrix() and readtable()
Show older comments
Hi community,
After reading that xlsread() is not recommended anymore, I tried to get my excel table read through the readmatrix() function. However, I don't seem to get my excel file read in the correct way. The output that readmatrix gives me is never what what I can read in the excel file. I assume this has something to do with the coding, so I tried to get the output right through the "OutputType" option, but this did not yield satisfactory results either.
Later then, I found the readtable() function, and this seems to give me what I want. However, I still need to understand a bit more why it does so, and that brings me to my basic question:
I read the documentations on both functions and from what I understand the most basic difference between them seems to be that readmatrix() returns an array and readtable() returns a table. But why does this cause me trouble in getting the data from the excel file returned correctly/the way I need it when I use readmatrix()? What is the underlying difference in how Matlab handles the data in this case? And, if readtable() actually turns out to the function that is better suited in this case, for what prupose would readmatrix() be better suited then? Could you give an example?
Note on the data I am intending to import from the excel file: it is a relatively simple database-like structured sheet where each row is an item and each column represents a propoerty of that item (e.g.: item may be a certain material, and the colums can be "price", "weight", etc.). The data contained in the property colums can be of both text or numeric type, or empty.
Thanks in advance for your kind support!
4 Comments
Most excellent question, indeed! As remarks in earlier answer indicate, I think it's incomprehensible what TMW's intent is in the longer view and this continual switching horses in midstream is most difficult to keep up with..it's one thing when some new specialized data class shows up in a toolbox and then almost immediately gets picked up in the base product excepting by a different name and syntax (dataset and table in particular come to mind) because at least the one is in the toolbox so may not pollute everybody's code, but when duplicated functionality of this nature and threats of deprecation/potential removal are made in base product functions, that's a whole different kettle of fish.
One thing I'd note in relation to your particular spreadsheet and that you say fields can have variable data or be missing...a weakness in the import object and how readtable works and I presume the new readmatrix is that it uses the data in what if finds to be the first record containing data and not header info to set the type of the variable for the column. Thus, even if a column is actually numeric, if the first record is missing that data, the import data will come in as a cellstr because it couldn't convert that field to a number successfully.
So, even if you use detectimportOptions, you still may need to edit the options object it returns manually to match what you know/want the data types to be.
This is particularly frustrating in a case such as this because two different files may well return different variable types for the same column owing to simply data in the files being different between cases even though both files are perfectly correct.
What it means is that the only real way to use the importoptions object reliably for a class of files is to create one for the file structure and save it for reuse; NOT to scan each file in a sequence of processing files. This adds complexity and another place for code to break or for the corollary options file to get lost and, while helpful in some ways, the introduction of the external object creates additional troubles besides. "There ain't no free lunch!"
Jeremy Hughes
on 3 May 2019
readtable should be used if the data in your file is most correctly represented as a table in MATLAB. I.e. there are multiple rows of data where each has multiple heterogeneous fields, but each of those fields has a consistent datatype.
readmatrix should be used if all the data is of the same type.
readcell is meant for the case where the data in the file is completly mixed. It will bring data in as text if it cannot otherwise be converted to a number, datetime, or duration.
Also, if you're having trouble with getting the import options for a particualr file, the Import Tool gui will generate MATLAB code (I think as of 18b for spreadsheet, 19a for delimted text files) which will construct the import options in a script or function and let you preview and customize the import.
Robert Kugler
on 6 May 2019
dpb
on 6 May 2019
If there are some user tools for defining and then cleaning up import option objects, that's a big step, potentially...(I'm stuck at R2017b for the time being for other reasons so can't go investigate just now).
While it's doable as is, the interface to the object through the supplied functions is very painful to deal with for complex files with large numbers of variables that may have issues such as outlined earlier (been there, done that number of times on the forum). Being able to make such changes interactively to specific variables would be great.
The same caveat as above must still be observed, however, and isn't documented nor discussed at all that I am aware of -- once one does this, one MUST retain that import object for use with all files of the type; rescanning another file may break the association thus so carefully and painfully created.
Accepted Answer
More Answers (0)
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!