MATLAB stuck at "Opening Large Spreadsheet"
9 views (last 30 days)
Show older comments
I have been trying to import an excel file to MATLAB. However, it does not work at all. The green box from the import tool just keeps running. I also tried using the commands readtable and such, as well as installing and uninstalling multiple versions of MATLAB. I want to import the whole table to use it in regression learner.
1 Comment
Walter Roberson
on 29 Oct 2023
How large is it? approximate number of rows and columns? Are the columns mostly numeric or are there a bunch of text columns as well? If so are do they have a lot of text in them?
Accepted Answer
Drew
on 3 Nov 2023
Perhaps solve the problem with the divide-and-conquer method. You could set this up programatically with code based on the readtable function. This approach could also identify if there are particular rows that are causing issues. You could potentially also explore whether using csv versus xlsx as the source makes a difference.
2 Comments
Walter Roberson
on 3 Nov 2023
it appears to be an xlsx with a single sheet. The internal representation of that sheet would be a large XML file containing all of the numeric data, and possibly some of the text data, along with references to some of the text data that had been moved to a different XML file.
I do not know about the current code, but the previous readtable strategy was to parse and convert the entire sheet, and then afterwards after the whole sheet had been returned, to trim off the unwanted parts.
For efficiency reasons, unless perhaps you build a finite state machine to do the parsing (such as using yacc and Lex), at the very least you are going to parse a row at a time and convert the row and column number references in the XML text. Potentially the row/column could be tested against the range to determine whether it is worth converting the text representation of the cell contents.
The point here is that splitting the reading by range might not solve much. There is no way in xlsx files to skip directly to a particular row. There is no marginal index kept that might permit quick consultation to skip more directly to a given row.
Drew
on 3 Nov 2023
I agree that always starting with the same large xlsx may not yield any improvement if MATLAB begins by parsing the entire file. Divide and conquer can be tried in various ways. Keep in mind that the issue may be caused by size, or by a particular problem with a paricular header, or a particular data entry. Some ideas:
(1) Use Excel to re-save a subset of the file with a portion of the data. Start with just the header and one data row. Test to see if readtable works.
(2) Use Excel to re-save the file with half the data, or one tenth of the data. See if readtable works. Observe memory use.
(3) Use Excel to re-save as csv. See if readtable works.
More Answers (0)
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!