Hello, I have a txt file that i have uploaded using readtable. This gives me a table of 4000000x34 variables. Some of these variabled are doubles, while some are cells. I have tried to transform the cell coulmns to doubles using cell2mat and str2double, but only get NaNs or error messages. Some of my variables look like this: The cell ones containt the '''. How can I transform the cell columns to doubles? Will need this when running my regression.

Cell column to double from table

dpb on 24 May 2021

Be much easier with real data instead of an image...even better would be to attach a small part of the file itself.

Using detectimportoptions to build an import object and setting the wanted types and what to do with missing data will also help, undoubtedly.

As is, your table above has missing values in two of the columns shown, that causes a problem with str2double because they're NOT numbers; the other columns may well have missing data as well.

Attach a small data file so we can see what is actually in the file...

Amina Ag on 24 May 2021

B.xlsx

Thank you for your answer!

I exported a part of the data into an xlsx file to be able to attach it, but the original file is txt.

I have tried with opts and selecting the variable types, but still without any luck. Would highly appreciate your help on getting around this.

Best regards!

dpb on 24 May 2021

Edited: dpb on 25 May 2021

Open in MATLAB Online

Well, the "Currency" column is entirely blank in this dataset -- is that overall or just this part of the file?

There are other missing data scattered hither and yon -- detectimportoptions returned numeric fields for those that are; your columns above with difficulties contain surrounding quotes that need to be removed first.

Some fields to the RHS of the table appear as they may be dates and/or times but isn't clear just what they should be/are; they also are quoted strings that either need a "quoted string" format for the field specified in the import object or strip them after reading.

opt=detectImportOptions('b.xlsx');
tB=readtable('b.xlsx',opt);
tB.item_2=categorical(tB.item_2);

begins; one could set item_2 to be categorical in the import object if chose; there are a fair number of <ismissing> values.

datetime(replace(tB.item_503,"'",""),'InputFormat','HHmm','Format','HH:mm')

is one assumption for what the content of item_503 may be; need to define and handle the others that are mysteries without explanation appropriately.

Amina Ag on 25 May 2021

Open in MATLAB Online

Thank you for your reply. The variables containing the ("'") are mostly categorical variables. So I see that the problem can be solved using your code, but item_33 should be a number column. This column is not possible to tranform from a cell to double.

Also, when trying to fill the missing values after transforming the to categorical, the fillmissing is not working and gives the following errror message:

Error using fillmissing/checkArrayType (line 774)
Fill constant must be a categorical array, a string array, a character row vector, or a cell array of character row vectors.
Error in fillmissing (line 160)
    [intConstOrWinSize,extM] = checkArrayType(A,intM,intConstOrWinSize,extM,x,false,ma);
 

Do you know how to make this code work? This is specific for item_50108. None of them are date columns.

dpb on 25 May 2021

Edited: dpb on 26 May 2021

Open in MATLAB Online

There is no item_33 in the uploaded dataset so I dunno about that.

When your data aren't properly formatted, somtimes detectimportoptions needs some extra help:

opt=detectimportoptions('b.xlsx');
opt=setvaropts(opt,'item_2','Type','categorical','FillValue','YourFillValue','Prefixes',"'",'Suffixes',"'");
opt=setvaropts(opt,'item_50108','Type','double','TrimNonNumeric',true);
tB=readtable('b.xlsx',opt);

yields

>> tB(1:8,{'item_2','item_50108'})
ans =
  8×2 table
       item_2        item_50108
    _____________    __________
    M                  63.40   
    M                  51.62   
    M                   0.00   
    M                  45.31   
    YourFillValue      51.54   
    M                  22.22   
    M                  70.20   
    M                  22.12   
>> 

You can deal with all of the issues either by defining the import objects fields to match known data types and formats or by cleaning up the data fields afterwards.

The problem with your categorical substitution is you undoubtedly didn't have a categorical variable on the RHS and you'll have to use the form shown in the documentation for categorical to add categories for values that aren't in the dataset that is converted if don't use the fill missing on import route as shown above.

Amina Ag on 27 May 2021

This worked perfectly!

Thank you so much for your help. It has been a real struggle and feels amazing now that it is solved. Highly appreciate your time and answer.

All the best!

Cell column to double from table

6 Comments
Show 4 older comments Hide 4 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

Cell column to double from table

6 Comments Show 4 older comments Hide 4 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

6 Comments
Show 4 older comments Hide 4 older comments