convert categorical to numeric

I have a categorical array and I want to convert it back to the numerical matrix. What is the syntax?
Thanks,

3 Comments

double();
double() returns category index https://www.mathworks.com/matlabcentral/answers/264383-convert-categorical-to-numeric#answer_311566
@AMEN BARGEES, if you look at other solutions, besides the accepted one, you will see that others suggested double(), followed by comments about how it did not actually solve the problem that was posed.

Sign in to comment.

 Accepted Answer

the cyclist
the cyclist on 19 Jan 2016
Edited: the cyclist on 19 Jan 2016
If you have the Statistics and Machine Learning Toolbox, you could use the grp2idx command:
c = categorical({'Male','Female','Female','Male','Female'})
n = grp2idx(c)
That will simply encode the categories as numerical variables (which is handy for some other software packages). But that does not really change the fact that "1", "2" etc are still really just categories.
If you have categories that somehow embed numbers inside of them, that you want to convert to truly numerical (e.g. ordinal or interval) data, you'll need to be more specific about what your input is.

7 Comments

Thanks,
my input is 12,12,13 in categorical. I want output of 12,12,13 in matrix form.
Is this specific enough?
So, your categorical variable is equivalent to
c = categorical([12 12 13])
?
yes, you are right. I just want to convert it back.
c = categorical([12 12 13]);
d = str2double(cellstr(c));
What about using the 'valueset' input to categorical like the help shows:
valueset = [1:3];
catnames = {'child' 'adult' 'senior'};
B = categorical(A,valueset,catnames,'Ordinal',true)
I have an array of 3500 strings(some are repeated values) in my work space. I'm not able to access them while building a machine learning model. Is there any way that I can convert them into numerical value and then use it in the model or else can you suggest me any other better approach.
I suggest you open a new question. You will get the attention of more people with a new, unanswered question rather than a comment on an answered question.
In that new question, I suggest that you include a small example of your data, or upload the entire array in a MAT file. You have not given enough information here to help you.

Sign in to comment.

More Answers (5)

Juste use the unique() function (which does not require any toolbox).
For example:
c = categorical({'Red','Blue','Red','Red','Blue','Blue','Green'});
[GN, ~, G] = unique(c)
Will return:
GN =
1×3 categorical array
Blue Green Red
G =
3
1
3
3
1
1
2

1 Comment

My comment on Xingyu Li's answer applies here as well. It works well if arbitrary numeric values are OK as output, but will not convert categorical '12' to numeric 12.

Sign in to comment.

Calling categorical is a data conversion, so

   c = categorical([12 12 13])

completely throws away the numeric values. In general, there is no way to get them back unless you have saved them, any more than you can get back the original values from int8([1.1 2.2 3.3]). Calling categorical is a data conversion.

That being said, you can certainly save the unique numeric values, and then index into those using the categorical array:

   n = uniqueNumericValues(c)

You can also call double on a categorical, but what you will get back are the category numbers, not the original numeric values.

But here's the question: if you need to convert back to the original numbers, and you are not using meaningful category names when converting from those numbers, why use categorical to begin with? There may be things you haven't mentioned.

4 Comments

I have the same problem, and the help file.... does not help at all
My data is categorical because the importdata chose that for it, I can force but then if I import new data and don't force it to numerical, my processing will stop working. I'm running a script so I can put a conversion there -> automate not rely on human memory!
In particular I have 160,000 lines of data in a table, one of 46 fields is an odo reading. This has converted to categorical, with 16983 categories - so might be more efficient, fair enough. But now I want to plot data against odo, so I need numerical. example subset:
>> catdata
ans = 1×8 categorical array
37241 37364 37099 4264 6339 38209 38070 16777215
So the original numbers are NOT lost, but are coded in the categories:
>> catcats=categories(catdata);
>> length(catcats)
ans = 16983
As noted above, double () gives the index not the value
>> double(catdata)
ans =
10880 10902 10858 11593 13789 11022 11004 4659
>> catcats(4659)
ans = 1×1 cell array
{'16777215'}
But cell2mat gives you a string not a number:
>> cell2mat(catcats(4659))
ans = '16777215'
So you then need to convert again using str2num (why no cell2num? There is a num2cell):
>> str2num(cell2mat(catcats(4659)))
ans = 16777215
So this works for one item, but when I use the 8 element data with the resulting strings being different length, it fails
>> catcats(double(catdata))
ans = 8×1 cell array
{'37241' }
{'37364' }
{'37099' }
{'4264' }
{'6339' }
{'38209' }
{'38070' }
{'16777215'}
>> cell2mat(catcats(double(catdata)))
Error using cat
Dimensions of matrices being concatenated are not consistent.
Error in cell2mat (line 83)
m{n} = cat(1,c{:,n});
This seems like way more difficult than it should be.
The fundamental problem is that your numeric data are being read in as categorical. I don't have your file, so I can't tell why that is, but I recommend you use detectimportoptions, and set the type, and use that in calls to readtable to read in all of your other data.
Thanks.
I've come to the conclusion that would have been easiest, although I've developed an effective though crude workaround.
vfdbdata.DD01km is my categorical data array (from a table of data)
odocats = categories (vfdbdata.DD01km);
odoval = zeros (1, length (odocats) ); % preallocate space
for kk=1:length(odocats),
odoval(kk)=str2num(cell2mat(odocats(kk)));
end
So this is run before the main processing, the numeric data can then be extracted as required by
odotemp = double ( vfdbdata.DD01km(vidx) ) ;
odotemp = odoval (max (1, odotemp) ) ;
The max ensures that 'undefined' values are processed without throwing an error (they give a NaN after translation to double, which causes a subscript error), I also have some code to process specific values that can occur (hence the use of a temporary variable).
a = categorical(["2" "3" "3"])
double(a) % returns [1 2 2] - maybe desired for some reason
double(string(a)) % returns [2 3 3] - maybe desired for some reason
categorical(double(string(a)) % returns the same thing as a

Sign in to comment.

Milan Andrejevic
Milan Andrejevic on 29 Apr 2018
It's an intuitive functionality that should exist. There are so many instances one needs to treat certain variables as categorical when using some modelling functions, and as continuous for other analyses, or simply be able to index the array comparing it to a number. This is so easy to do in other programming languages.
Xingyu Li
Xingyu Li on 15 Dec 2017
double(categorical)

1 Comment

This is a great solution for the use case of assigning arbitrary numeric values to general categorical variables, e.g.
c = categorical({'Male','Female','Female','Male','Female'})
But this will not solve this poster's particular use case of
c = categorical([12 12 13]);
and wanting numeric [12 12 13] as the output.

Sign in to comment.

Nathan Blanc
Nathan Blanc on 16 Jan 2021
I converted the categorical data into a char and then used str2num. worked for me :)

1 Comment

In most cases it is better to use str2double() rather than str2num(). str2num() invokes the full power of eval(), which can lead to problems.

Sign in to comment.

Categories

Tags

No tags entered yet.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!