How to divide dataset into a test, train, split format?

Hello,
I'm trying to split my dataset have the format X_train, X_test, y_train and y_test - in similar fashion to Python's test_train_split but I'm struggling to find a method to do so. Is this possible in MatLab?
I've tried doing the following
seed = 42;
rng(seed);
cv = cvpartition(size(dataset,1), "HoldOut", 0.2);
idx = cv.test;
X_train = subsample(~idx,:);
y_test = subsample(idx,:);
but I'm not entirely sure how to go about deriving X_test and y_train.
Does anybody have a good solution to this? Apologies as I'm fairly new to MatLab!
Thank you!

 Accepted Answer

Does the variable subsample contains both 'X' and 'y' values? If yes, then you don't need to create two variables for X and 'y'. Just use
subsample_train = subsample(cv.training, :)
subsample_test = subsample(cv.test, :)
However, if subsample contains 'X' values and another variable (say, 'y') contain y values then you can do something like this
X_train = subsample(cv.training, :);
y_train = y(cv.training, :);
X_test = subsample(cv.test, :);
y_test = y(cv.test, :);

6 Comments

Hey Ameer,
thanks for the reply!
I noticed my code had a typo, the correct code is:
seed = 42;
rng(seed);
cv = cvpartition(size(subsample,1), "HoldOut", 0.2);
idx = cv.test;
X_train = subsample(~idx,:);
y_test = subsample(idx,:);
Could you eloborate more on your second point
However, if subsample contains 'X' values and another variable (say, 'y') contain y values then you can do something like this
as I'm not sure I understood it entirely.
Thank you!
To explain the point, can you specify what data is stored in 'subsample'.
Yes of course,
My subsample file contains my scaled, cleaned dataset containing my features and classes that I'll train my models on. It's a 664x31double.
If first 30 columns are features and last column is label then you can do this
X_train = subsample(cv.training, 1:30);
y_train = y(cv.training, 31);
X_test = subsample(cv.test, 1:30);
y_test = y(cv.test, 31);
X_train and X_test are feature matrices and y_train and y_test are label vectors.
Thank you very much, Ameer!
I am glad to be of help!

Sign in to comment.

More Answers (0)

Asked:

on 4 Nov 2020

Commented:

on 6 Nov 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!