Categorical to Numeric problem

16 views (last 30 days)
Stephen Gray
Stephen Gray on 8 Jan 2024
Commented: Cris LaPierre on 11 Jan 2024
Hi
I have a table that has numeric and categorical items in it. I have converted the catergorical items to numeric using the unique() function which works very well and I can then feed the matrix into an NN for training. The problem is when I feed new data to get results, I don't know how to make sure the converted categirical data in the new table matches ther numbers in the training data. i.e. if a categorical field in the training data is converted to the number 5, how do I make sure if that categorical data is in the new data, that it gets assigned the same number? I'm begining to think it may be a manual thing
SPG

Accepted Answer

Hassaan
Hassaan on 8 Jan 2024
% Example Training Data (Categorical)
training_categorical_data = {'cat', 'dog', 'fish', 'dog', 'cat'};
% Convert Categorical Data to Numeric for Training
[unique_categories, ~, numeric_categories] = unique(training_categorical_data);
category_to_number_map = containers.Map(unique_categories, num2cell(1:length(unique_categories)));
numeric_training_data = cell2mat(values(category_to_number_map, num2cell(training_categorical_data)));
% Training Process with numeric_training_data
% [Your neural network training code goes here]
% Example New Data (Categorical)
new_categorical_data = {'dog', 'cat', 'bird'};
% Convert New Categorical Data to Numeric Using Training Mapping
numeric_new_data = zeros(size(new_categorical_data));
for i = 1:length(new_categorical_data)
if isKey(category_to_number_map, new_categorical_data{i})
numeric_new_data(i) = category_to_number_map(new_categorical_data{i});
else
% Handle unseen categories, e.g., assign a special number or ignore
numeric_new_data(i) = NaN; % Assign NaN for unseen categories
end
end
% Now, numeric_new_data is ready for use with the trained model
% [Your prediction or evaluation code goes here]
  • The training data training_categorical_data is a cell array of categorical strings. This is converted to numeric_training_data using a mapping (category_to_number_map).
  • The new data new_categorical_data is then converted using the same mapping. Unseen categories (like 'bird' in this example) are handled separately; here, I've assigned NaN to them, but you can choose another method as appropriate.
  • You'll need to insert your specific neural network training and prediction code where indicated. The numeric_training_data and numeric_new_data arrays are what you'd use for training and prediction, respectively.
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
  4 Comments
Stephen Gray
Stephen Gray on 10 Jan 2024
OK, using dictionary instead and it's working so far.
Stephen Gray
Stephen Gray on 11 Jan 2024
OK. I've got it to work now using dictionaries. Both this answer and the next one helped me get it working. AS yours includes how to use new data to I'll mark it as the answer. Thanks both for answering.

Sign in to comment.

More Answers (1)

Cris LaPierre
Cris LaPierre on 8 Jan 2024
Moved: Cris LaPierre on 8 Jan 2024
Could you provide more details about your NN? I would think you should be able to pass categorical data into your network without having to convert it to numeric first.
If not, then I'd look into creating a dictionary, where you pass in the categorical value, and it returns the numberic value.
A = categorical({'medium' 'large' 'small' 'medium' 'large' 'small'});
names = unique(A)
names = 1×3 categorical array
large medium small
values = (1:length(names));
d = dictionary(names,values)
d = dictionary (categorical --> double) with 3 entries: large --> 1 medium --> 2 small --> 3
A(4)
ans = categorical
medium
x = d(A(4))
x = 2
  4 Comments
Stephen Gray
Stephen Gray on 9 Jan 2024
Unfortunately not. The code part is
InpsM = table2cell(Inps);
OutsM =table2cell(Outs);
InpsM=InpsM';
OutsM=OutsM';
net=feedforwardnet([96,48,24]);
net.trainFcn = 'trainlm';
net.inputs{1}.processFcns = {'mapstd'};
net=train(net,InpsM,OutsM,'useParallel','yes');
The error I get is
Error using nntraining.setup>setupPerWorker
Inputs X{1,1} is not numeric or logical.
Error in nntraining.setup (line 77)
[net,data,tr,err] = setupPerWorker(net,trainFcn,X,Xi,Ai,T,EW,enableConfigure);
Error in network/train (line 336)
[net,data,tr,err] = nntraining.setup(net,net.trainFcn,X,Xi,Ai,T,EW,enableConfigure,isComposite);
Error in untitled (line 52)
net=train(net,InpsM,OutsM,'useParallel','yes');
SPG
Cris LaPierre
Cris LaPierre on 11 Jan 2024
Found this, albeit on the trainnetwork page and not train, but it appears to still be applicable.
"To train a network using categorical features, you must first convert the categorical features to numeric."

Sign in to comment.

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!