Why are xlsread (and readtable) doing such a poor job at loading mixed data files?
Show older comments
So, this issue has popped up regularly when loading from mixed data files. For example, when importing data using [data, text, raw] = xlsread('file.xlsx'), interpretation of the data appears strangely random. Though the 'data' and 'text' contain solely numeric and text data respectively, oftentimes the 'text' data still contains numeric data stored as text; on some occasions because a column contains both numeric and text data, but regularly also with columns that contain only numeric data (albeit with a header on the first row). In some cases xlsread even returns columns with only numeric data as text but with one or more cells as numeric. The 'raw' data in that regard shows the same distinction between numeric and text data. Similar (though different) results are obtained when using readtable by the way.
As an example, see attached data which contains a section of a full table ('raw' data from xlsread). Apart from the header, columns 3 to 6, 8 and 9 contain only numeric data (verified this in original file), while columns 59 to 62 contain mixed data. Somehow only row 3 of the mentioned first 6 colums is interpretated fully as numeric. More interestingly, column 4, row 7537 is interpreted as numeric whereas the rest of the column is text. Later columns (59 to 62) containing mixed data are again loaded as numeric data.
Now I'm sure there are reasons why this happens, and even alternatives for loading the data... but why return an ambiguous result like this in the first place? Either do it correctly or return everything as text so I can interpret the data myself, because this only adds additional processing time; first from the function interpretating the data and next from having to re-intepret the data. Or is there someting I'm missing here?

3 Comments
Walter Roberson
on 18 Jul 2022
I suggest you use detectImportOptions and use setvartype() on the returned options to set the columns to the appropriate types; then readtable() passing in the options.
Sjouke Rinsma
on 18 Jul 2022
dpb
on 18 Jul 2022
It's only trying to do what is asked -- import the Excel sheet. If it's malformed/mistyped, that's not MATLAB's fault; it can only presume that's what was intended.
Accepted Answer
More Answers (0)
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!