Error in using datastore function to read big data
7 views (last 30 days)
Show older comments
I am usng datastore function along with 'hasdata()' to read big data, it has 7 colomns by 46 million rows, when I read just one colomn it works fine but when I try to read all seven colomns it gives a error telling "Error using matlab.io.datastore.TabularTextDatastore/readdata. Mismatch between file and format character vector.
I can work with chunks, read the first chunk, perform calculations,write to a file and take the next one. Apprecaite a solution for this problem.
0 Comments
Answers (1)
Jeremy Hughes
on 24 Mar 2017
Hi Eric,
I'm guessing datastore is running into an issue parsing the file. This often happens when there is a non-number entry in a '%f' column.
Without seeing the contents of the file, it's hard to offer a solution, but to diagnose the issue, try setting elements of TextscanFormats from '%f' to '%q'. The data will read in as character vectors, but you should be able to find the non-numeric data that is causing the problem.
If there is an entry like "NA" you can use the TreatAsMissing property to import those as NaN. If the data isn't really numeric, you can just import the column as characters.
Less frequently, there may be suffixes like "123mm"; If the suffix is common to all the entries, you can change the format to '%f mm' to remove the literal suffix, however, if there are any rows without the suffix, datastore will fail, and you can use %q then post process the data.
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!