New data from one hot encoded NN

Question

Stephen Gray on 20 Jul 2023

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1998498-new-data-from-one-hot-encoded-nn

Commented: Stephen Gray on 20 Jul 2023

Hi all.

I have a Neural Net with a few One Hot endocded fields. Everything is working well and it is exhibiting the learning rate I need. The issue I have is running new data through the NN to get the propbability output. The issue is that when I created the NN I used a lot of data so for instance one field was transformed into 400 variable by one hot encoding. When I convert the new data to run throught the NN of course I need to one hot encode the field in the new data. The problem is that of course, there being less data, the one hot encode doesn't convert to the same amount of vars so doesn't work.

So for example before the one hot encode the variable had 400 different vars once encoded. The new data matching field only had 5 different types so only had 5 vars once encoded. I'm sure I'm missing something here so does anyone have any idea?

SPG

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Nandini on 20 Jul 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1998498-new-data-from-one-hot-encoded-nn#answer_1276418

It seems like you're facing an issue with the dimensionality mismatch when applying one-hot encoding to new data that has fewer categories compared to the original data used to train your Neural Network (NN). One possible solution is to ensure that the encoding of the new data matches the same dimensions as the original data.

Here's a suggestion to handle this situation:

1. Determine the unique categories of the original data that was used for training. Let's call this set of unique categories `original_categories`.

2. When you encounter new data, one-hot encode the field as usual. However, instead of using the one-hot encoding directly, ensure that the resulting one-hot encoded vector has the same dimensions as the original data.

- Create a list of unique categories in the new data. Let's call this set of unique categories `new_categories`.

- Compare `new_categories` with `original_categories` to identify any missing categories.

- Add the missing categories to `new_categories` and sort them in the same order as `original_categories`.

- Perform one-hot encoding on the field using the updated `new_categories` to ensure the dimensions match.

By aligning the categories and dimensions of the one-hot encoding for both the training data and new data, you can ensure consistency when running the new data through the Neural Network.

Additionally, it's worth noting that if you encounter categories in the new data that were not present in the original training data, the Neural Network may not have learned how to handle these categories effectively. In such cases, it's important to consider how to handle these unseen categories appropriately.

I hope this suggestion helps resolve the dimensionality mismatch issue. Let me know if you have any further questions!

1 Comment
Show -1 older commentsHide -1 older comments

Stephen Gray on 20 Jul 2023

That makes a lot of sense thanks. The data will be unlikely to contain anything unseen before so that should be OK. It's a bit more work than I thought but then it always is with data!

SPG

Sign in to comment.

New data from one hot encoded NN

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

New data from one hot encoded NN

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments