How can I read data from a datastore without knowing the exact data type?

3 views (last 30 days)
I am working on a data cleaning project. The data set has a column, which should be filled with float numbers while there are some chars or strings or missing values. The examples are:
ID Date TimeSeries1
A 20170901 1.2
A 20170902 1.3
A 20170903 NaN
A 20170904 Char
A 20170905 String
A 20170906 1.7
My approach is that I read one row and test if the TimeSeries1 of that row is float or not. Then I pick it out. I load this .csv file as a data store and I try to read one row per time. But the Read function of datastore requires to specify the data type that I read, such as {"%q" "%d" "%f"}. So when it reads a row with wrong value, the program exits and I cannot make further operations on it.
So how can I read data from a datastore without knowing the exact data type?

Accepted Answer

Jeremy Hughes
Jeremy Hughes on 2 Oct 2017
Hi Peixin,
I assume you don't know all the possible strings that can appear in this column. The only way I think you can proceed with datastore is to specify the format for this variable as '%q' and check your data after it's imported. You can use str2double to convert the text and throw out the ones that cannot be converted.
The only question is what you want to do with the literal 'NaN' in the text which is correctly converted to NaN. You would have to make the decision in your code.
If on the other hand, you know all the non-numeric values that might appear, you can specify the format as '%f' and use
ds.TreatAsMissing = {'Char';'String'}
and those values would be imported as NaN. (or specifically ds.MissingValue)
Jeremy

More Answers (0)

Categories

Find more on 大型文件和大数据 in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!