Cell column to double from table
Show older comments
Hello,
I have a txt file that i have uploaded using readtable. This gives me a table of 4000000x34 variables. Some of these variabled are doubles, while some are cells. I have tried to transform the cell coulmns to doubles using cell2mat and str2double, but only get NaNs or error messages.
Some of my variables look like this: The cell ones containt the '''.

How can I transform the cell columns to doubles? Will need this when running my regression.
6 Comments
dpb
on 24 May 2021
Be much easier with real data instead of an image...even better would be to attach a small part of the file itself.
Using detectimportoptions to build an import object and setting the wanted types and what to do with missing data will also help, undoubtedly.
As is, your table above has missing values in two of the columns shown, that causes a problem with str2double because they're NOT numbers; the other columns may well have missing data as well.
Attach a small data file so we can see what is actually in the file...
Amina Ag
on 24 May 2021
Well, the "Currency" column is entirely blank in this dataset -- is that overall or just this part of the file?
There are other missing data scattered hither and yon -- detectimportoptions returned numeric fields for those that are; your columns above with difficulties contain surrounding quotes that need to be removed first.
Some fields to the RHS of the table appear as they may be dates and/or times but isn't clear just what they should be/are; they also are quoted strings that either need a "quoted string" format for the field specified in the import object or strip them after reading.
opt=detectImportOptions('b.xlsx');
tB=readtable('b.xlsx',opt);
tB.item_2=categorical(tB.item_2);
begins; one could set item_2 to be categorical in the import object if chose; there are a fair number of <ismissing> values.
datetime(replace(tB.item_503,"'",""),'InputFormat','HHmm','Format','HH:mm')
is one assumption for what the content of item_503 may be; need to define and handle the others that are mysteries without explanation appropriately.
Amina Ag
on 25 May 2021
There is no item_33 in the uploaded dataset so I dunno about that.
When your data aren't properly formatted, somtimes detectimportoptions needs some extra help:
opt=detectimportoptions('b.xlsx');
opt=setvaropts(opt,'item_2','Type','categorical','FillValue','YourFillValue','Prefixes',"'",'Suffixes',"'");
opt=setvaropts(opt,'item_50108','Type','double','TrimNonNumeric',true);
tB=readtable('b.xlsx',opt);
yields
>> tB(1:8,{'item_2','item_50108'})
ans =
8×2 table
item_2 item_50108
_____________ __________
M 63.40
M 51.62
M 0.00
M 45.31
YourFillValue 51.54
M 22.22
M 70.20
M 22.12
>>
You can deal with all of the issues either by defining the import objects fields to match known data types and formats or by cleaning up the data fields afterwards.
The problem with your categorical substitution is you undoubtedly didn't have a categorical variable on the RHS and you'll have to use the form shown in the documentation for categorical to add categories for values that aren't in the dataset that is converted if don't use the fill missing on import route as shown above.
Amina Ag
on 27 May 2021
Answers (0)
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!