Error in using datastore function to read big data

7 views (last 30 days)
I am usng datastore function along with 'hasdata()' to read big data, it has 7 colomns by 46 million rows, when I read just one colomn it works fine but when I try to read all seven colomns it gives a error telling "Error using matlab.io.datastore.TabularTextDatastore/readdata. Mismatch between file and format character vector.
I can work with chunks, read the first chunk, perform calculations,write to a file and take the next one. Apprecaite a solution for this problem.

Answers (1)

Jeremy Hughes
Jeremy Hughes on 24 Mar 2017
Hi Eric,
I'm guessing datastore is running into an issue parsing the file. This often happens when there is a non-number entry in a '%f' column.
Without seeing the contents of the file, it's hard to offer a solution, but to diagnose the issue, try setting elements of TextscanFormats from '%f' to '%q'. The data will read in as character vectors, but you should be able to find the non-numeric data that is causing the problem.
If there is an entry like "NA" you can use the TreatAsMissing property to import those as NaN. If the data isn't really numeric, you can just import the column as characters.
Less frequently, there may be suffixes like "123mm"; If the suffix is common to all the entries, you can change the format to '%f mm' to remove the literal suffix, however, if there are any rows without the suffix, datastore will fail, and you can use %q then post process the data.
  1 Comment
Eric Girija
Eric Girija on 25 Mar 2017
Hi Jeremy, I did check for a non number entry and could not find it. I have 7 variables and the error was in the seventh one, if I use the 'selectedvariable' functionality and read only the 7th variable instead of all 7 variables, it runs fine. Is it a memory problem? The input data I am using has 8 digits before and after decimal, I would like to run it with just two digits after decimal. I am still figuring out a way to edit this big file. Do you will that make a difference.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!