MATLAB Answers

How to split a dataset in 3 sets using splitEachLabel using percentage such that each class appears in all 3 sets?

24 views (last 30 days)
Muhammad Faisal
Muhammad Faisal on 24 Jun 2020
Commented: Muhammad Faisal on 7 Jul 2020
I've an image dataset with around 100 classes and the maximum number of images for one class is 59 whereas the minimum is 5. I try to split the data into training, validation and testing by using the following statement
[imdsTrain,imdsValidation, imdsTest] = splitEachLabel(imds,0.75,0.15,'randomize');
I got the error that training and validation data must have same labels.
I checked the imds and found that for classes having less number of images like 5, it puts 4 in training and 1 sometimes either in validation set and some in test data set. So all classes that are in training are not found in validation or test data set.
I solved it by increaing the validation percent to 0.2 instead of 0.15 but it doesn't seem a good solution.
Is there a way to split the dataset such that all classes are present in all 3 datasets? Preferably I want to make it using percentages and don't want to use integer such that it puts always 1 image in validation and test dataset.

  0 Comments

Sign in to comment.

Answers (1)

Anmol Dhiman
Anmol Dhiman on 3 Jul 2020
Edited: Anmol Dhiman on 3 Jul 2020
Hi Faisal,
The second arguement (0.75) in splitEachLabel is proportion representing proportion of files to split, specified as a scalar in the interval (0,1) or a positive integer scalar. You can change its value for your problem.
Regards,
Anmol Dhiman

  1 Comment

Muhammad Faisal
Muhammad Faisal on 7 Jul 2020
This is already known and I appplied it. The problem I'll try to explain below with simple example.
Suppose there are 5 classes A, B, C, D, E. For each class I've some images inside the folders (unbalanced dataset). Now what happens after using the function, the training data has all 5 classes but in validation only 3 or 4 classes appears, say A, B, C, D. Similarly, in test portion few classes appears, say A, B, E.
This causes a problem for me when I use trainNetwork with ValidationData, that train and validation labels must be same. I need to have all classes in all parttions/proportions.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!