Applying a K-Fold cross validated model to predict the response variable for new data

37 views (last 30 days)
I have trained a k-fold cross-validated model using the fitctree classification model. Here is my code:
% Extract predictors and response
predictors = featTable(:, 1:end-1);
response = featTable.ActivityID;
% For reproducible results
rng default
% Partition the data using kFold method
cvp = cvpartition(response,'KFold',5); % setting KFold to 5 gives the lowest loss
% Calculate the model loss
KFoldMdl = fitctree(featTable,'ActivityID', "CVPartition",cvp);
kfLoss = kfoldLoss(KFoldMdl) %model loss: 0.0128
I now want to apply my newly trained k-fold corss-validated model to classify new data that the model hasn't seen before. I used the following code:
Predictions = predict(KFoldMdl, predictorsNew)
When I run the code on my new unclassified predictors, I get the following error message:
Error using predict
No valid dataset found for the "predict" command. Specify a dataset using an iddata object, a timetable object, idfrd
object, or numeric matrices.
I'm not sure why this is not working as the code above works for the model if it is trained using the holdout option in cvpartition. I've attached my new predictors file. So could you please help with advising how I can apply a k-fold trained cross-validated model to new data? I tried using kfoldPredict but that doesn't work either.
Any help would be most appreciated.
  1 Comment
the cyclist
the cyclist on 14 Jun 2023
It would be helpful if you uploaded all the data needed to run the above code, which looks like it would be
  • featTable, or both predictors and response
  • predictorsNew
Uploading a subset of the data the gives the same behavior would be fine.
I don't think this is likely a data-specific problem, but it's still easier if we can just replicate what you are doing, rather than create an example of our own for debugging.

Sign in to comment.

Accepted Answer

Aakash on 14 Jun 2023
You can use the predict function in MATLAB to predict responses using the cross-validated model KFoldMd and the new data predictorsNew.
The code would look like this:
y_predict = predict(KFoldMd.Trained{1}, predictorsNew);
Note that KFoldMd.Trained{1} is the trained model for the first fold in the cross-validation, you can use any fold that you think has the best performance.
Impala on 14 Jun 2023
I see your point about the predict function not being the best to give an accurate prediction.
So what is the syntax if I want to combine the predictions from all folds? Can I still use kfoldPredicts?
Aakash on 15 Jun 2023
No using kfoldPredict you cannot predict for any new data, you can just see the output labels for entire data that is given to the model through fitctree function.
For checking performance of folds, you can write a function on your own where your predict using the above method I suggested and compare it with truth values.

Sign in to comment.

More Answers (1)

the cyclist
the cyclist on 14 Jun 2023
Immediately after writing my comment above (which is still true), I think I see the answer without the additional info.
For k-fold models, you need to use kfoldPredict (for classification tree models).
  1 Comment
Impala on 14 Jun 2023
Thanks for your feedback.
Apologies for not including the featTable data (which I've now attached) - I thought it wasnt required as that was the data I had used to train the model, the code for which worked fine.
I was having trouble applying the model to new data (NewPredictors file). I had tried to use the kfoldPredict function but this onlye requires the model as an input - I couldn't see how to include the new predictors in this function. Could you kindly help with this or shall I just stick to using the predict function?
Thanks in advance!

Sign in to comment.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!