How I make a better pre-processing for machine learning?

Question

HelpAStudent on 28 May 2022

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1728950-how-i-make-a-better-pre-processing-for-machine-learning

Edited: Avadhoot on 9 Oct 2023

Hi! I have a data set with different features of data (size 1950x22). With this I have to develop an algorithm through machine learning that is capable of predicting with respect to one of the categories (in particular the 22nd) when the result gives new data for the other features. So I summarize: the output features (the 22nd), the one that expresses the result through which the other categories (the first 21 columns) must be trained to predict, has been categorized into three different categories: 1,2,3.. The problem is that after the pre-processing the reference categories have become from

1: 1655

2: 295

3: 176

to:

1: 1337

2: 135

3: 24

Is there a way to overfit the data of the 3 categories output? Or to make sure that in doing the training in the classification learner app it takes all the data belonging to category 3 of the output features

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Avadhoot on 9 Oct 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1728950-how-i-make-a-better-pre-processing-for-machine-learning#answer_1329069

Edited: Avadhoot on 9 Oct 2023

Open in MATLAB Online

Hi,

I understand that you have a class imbalance in your dataset. The imbalance is further worsened by the preprocessing performed. There are several ways to deal with class imbalance. Some of them are listed below:

Undersampling:

You can remove some of the samples of the majority class (i.e., class 1 and 2) by randomly discarding them using the “datasample” function. You can use the function by adding the following line to your code:

  dataSampled = datasample(data,k,'Replace',false); 
Unrecognized function or variable 'data'.

Here’s what the code does:

“k” is the number of samples you want to select.
Setting the “Replace” input argument to “false” ensures that the sampling is done without replacement.

Refer to the below documentation for details about data sampling:

https://www.mathworks.com/help/stats/datasample.html

2. Oversampling:

You can use an oversampling technique like Synthetic Minority Over-sampling Technique (SMOTE) to create synthetic samples of class 3 so that the class imbalance is resolved. For details on how to use SMOTE in MATLAB please refer to the following FileExchange submission:

https://www.mathworks.com/matlabcentral/fileexchange/75401-synthetic-minority-over-sampling-technique-smote

3. Class weighting:

You can assign different weights to each class such that class 3 is given more importance. You can do it by using the “ClassWeights” option in your classification layer as follows:

classificationLayer(Classes=classes,ClassWeights=classWeights)

Here is what the parameters mean:

“classes” is a vector containing all class names i.e. [1,2,3].
“ClassWeights” is a vector containing the weights for each class. Here you can specify more weight to class 3.
A sample “ClassWeights” vector would be: [1,2,4].

Refer to the following documentation for more details on class weighting:

https://www.mathworks.com/help/deeplearning/ug/sequence-classification-using-inverse-frequency-class-weights.html#:~:text=TTest%20%3D%20labelsImbalanced(idxTest)%3B-,Determine,-Inverse%2DFrequency%20Class

4. Evaluation metrics:

Use evaluation metrics like precision, recall, F1 score and AUC-ROC so that class imbalance does not affect the model.

Hope this helps.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How I make a better pre-processing for machine learning?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How I make a better pre-processing for machine learning?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments