how to use readtable with excel spreadsheets, skip extra header lines, and read to "end"?

80 views (last 30 days)
readtable apparently doesn't support "HeaderLines" option for xls files. From documentation, it does support reading ranges like
'Range','D2:H4'
But I can't find the magic format to get it to read to an arbitrary 'end'.
'Range','D2:end'
doesn't work, nor any obvious variations, such as
'Range','D2:M.end' .
I have a set of data files with arbitrary lengths and arbitrary # of columns, which have 4 header lines. I'd like to read them into tables, picking up the first line as column labels, skipping lines 2-4, then reading the rest of the file.
T = readtable('plot 1.xlsx');
results in 1st line used as headers (good), but text of headers in lines 2-4 cause all remaining (numeric) data to be read in as text, not numeric data (bad).
Anybody got a working solution to this problem? Am running matlab 2015a under OS X 10.10 (Yosemite).
  3 Comments
Tasio Oderkerk
Tasio Oderkerk on 19 May 2016
Same issue, Matlab 2016, removed header lines by hand.
If anyone has a better solution, please resp.
Bridget Eckhardt
Bridget Eckhardt on 8 Jun 2017
Edited: Bridget Eckhardt on 8 Jun 2017
We should be able to use the following, but I still get a problem with mine. It seems to randomly choose which is the header line even though I have specified which it is. However, the data range specification does work. So, this starts identifying data at A4.
opts=detectImportOptions('Test.xlsx');
opts.VariableNamesRange='A3';
opts.DataRange='A4';

Sign in to comment.

Answers (1)

Jeremy Hughes
Jeremy Hughes on 9 Jun 2017
Hi Ian, Without the actual file you're trying to read it's hard to say for sure, but I think I can help. Header lines are assumed to be at the beginning of the file, so since your first row contains the variable names, you really don't have any header lines (according to READTABLE's definition). Unfortunately, if the 'HeaderLines' parameter worked with spreadsheet files, based on your description of the file, it wouldn't help.
However in R2016b, DETECTIMPORTOPTIONS does allow a nearly-identical parameter 'NumHeaderLines', and using import options, you can set a starting cell for your DataRange. It will also attempt to get a best guess at which row is really data.
Try this:
opts = detectImportOptions('Test.xlsx','NumHeaderLines',0);
You can inspect the results to see if it matches your expectation, and it this didn't get exactly what you wanted:
opts.VariableNamesRange = 'A1';
opts.DataRange = 'A4';
Now you just need to tell READTABLE to use the options.
T = readtable('Test.xlsx',opts,'ReadVariableNames',true)
You might not need the ReadVariableNames parameter; READTABLE prefers to use the variable names in OPTS, and that works okay so long as DETECTIMPORTOPTIONS got them right. DETECTIMPORTOPTIONS will not set a VariableNamesRange/Line if the first row (after the detected or prescribed number of header lines) contains the same datatypes as the other rows--it assumes they are part of the data and that there are no variable names in this file.
I hope this helps. If you still have an issue, upload an example file and I'll see what I can do. Jeremy

Categories

Find more on Data Import from MATLAB in Help Center and File Exchange

Tags

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!