How to specify a portion of dataset for cross-validation with fitrgp?

4 views (last 30 days)
I am using fitrgp and would like to do cross-validation using a predetermined dataset as the valiadtion data (I have one dataset for training, and another one for validation). I've read the documentation below and similar questions on this forum, but I haven't seen a way that this is possible. Alternatively, is there a way to specify the indices of one dataset to indicate the training portion and the validation portion?
Any help is appreciated, thanks!

Accepted Answer

Katy
Katy on 29 Sep 2023
It turns out custom cross-validation partitioning is a feature available in R2023b. I was able to specify the test indices similar to this example.
Thanks to the Mathworks Technical Support team as well for the help!

More Answers (1)

Maneet Kaur Bagga
Maneet Kaur Bagga on 26 Sep 2023
Hi Katy,
  • As per my understanding to perform cross-validation using a predetermined dataset as the validation data with "fitrgp", "cvpartition" function can be used to create a custom partition object. This allows to specify the indices of the training and validation portions.
  • For instance, "cvpartition" can be used to create a hold-out validation partition object. The "numObservations" parameter is set to the number of observations in the training dataset. The "HoldOut" method is used, and the size of the validation dataset (X_val) is specified.
  • The training and test methods of the partition object can then be used to obtain the indices for the training and validation portions, respectively. These indices are used to select the corresponding data from the training dataset (X_train and Y_train).
  • Finally, the "fitrgp" function can be used to train the GP model using the training data, and the "predict" function is used to obtain the predictions on the validation data (X_val_cv). Then calculate performance metrics, such as mean squared error or R-squared, using the predicted values (Y_val_pred) and the actual validation targets (Y_val_cv).
Please refer to the following documentation for better understanding of the functions:
fitrgp
cvpartition
predict
Hope this helps!
Thank You
Maneet Bagga
  1 Comment
Katy
Katy on 27 Sep 2023
Hi Maneet,
Thank you for this really detailed response! Just to follow-up on this point:
  • The training and test methods of the partition object can then be used to obtain the indices for the training and validation portions, respectively. These indices are used to select the corresponding data from the training dataset (X_train and Y_train).
Using this cvpartition holdout method, based on my understanding, the indices are then selected randomly by the cvpartition object even if using the number of observations in the test set rather than the fraction.
I referred to this example:
openExample('stats/EstimateNewDataClassificationUsingCrossValidationErrorExample')
and experimented with changing this line:
hpartition = cvpartition(n,'Holdout',0.3)
to an integer (5 for example below)
hpartition = cvpartition(n,'Holdout',5)
From this it seems that the indices in 'idxTrain' and 'idxNew' variables are randomly selected.
I'm hoping to find a way to manually indicate which indices to select as the training set, and which indices to select as the validation set. (i.e. idxTrain = tbl(1:50, :) and idxTest = tbl(1:15, :) for example)
Thank you again for your response!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!