How to split a dataset in 3 sets using splitEachLabel using percentage such that each class appears in all 3 sets?

Question

Muhammad Faisal on 24 Jun 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/554182-how-to-split-a-dataset-in-3-sets-using-spliteachlabel-using-percentage-such-that-each-class-appears

Commented: Muhammad Faisal on 7 Jul 2020

I've an image dataset with around 100 classes and the maximum number of images for one class is 59 whereas the minimum is 5. I try to split the data into training, validation and testing by using the following statement

[imdsTrain,imdsValidation, imdsTest] = splitEachLabel(imds,0.75,0.15,'randomize');

I got the error that training and validation data must have same labels.

I checked the imds and found that for classes having less number of images like 5, it puts 4 in training and 1 sometimes either in validation set and some in test data set. So all classes that are in training are not found in validation or test data set.

I solved it by increaing the validation percent to 0.2 instead of 0.15 but it doesn't seem a good solution.

Is there a way to split the dataset such that all classes are present in all 3 datasets? Preferably I want to make it using percentages and don't want to use integer such that it puts always 1 image in validation and test dataset.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Anmol Dhiman on 3 Jul 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/554182-how-to-split-a-dataset-in-3-sets-using-spliteachlabel-using-percentage-such-that-each-class-appears#answer_460469

Edited: Anmol Dhiman on 3 Jul 2020

Hi Faisal,

The second arguement (0.75) in splitEachLabel is proportion representing proportion of files to split, specified as a scalar in the interval (0,1) or a positive integer scalar. You can change its value for your problem.

Regards,

Anmol Dhiman

1 Comment
Show -1 older commentsHide -1 older comments

Muhammad Faisal on 7 Jul 2020

This is already known and I appplied it. The problem I'll try to explain below with simple example.

Suppose there are 5 classes A, B, C, D, E. For each class I've some images inside the folders (unbalanced dataset). Now what happens after using the function, the training data has all 5 classes but in validation only 3 or 4 classes appears, say A, B, C, D. Similarly, in test portion few classes appears, say A, B, E.

This causes a problem for me when I use trainNetwork with ValidationData, that train and validation labels must be same. I need to have all classes in all parttions/proportions.

Sign in to comment.

How to split a dataset in 3 sets using splitEachLabel using percentage such that each class appears in all 3 sets?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

How to split a dataset in 3 sets using splitEachLabel using percentage such that each class appears in all 3 sets?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments