knnimpute in training/ testing sets

2 views (last 30 days)
Salim Al-Wasity
Salim Al-Wasity on 16 Dec 2020
Answered: Aditya Patil on 24 Dec 2020
Dear support
I am planning to convert my machine learning code from R to MATLAB in which I impute the missing variable using KNN. In the R code, I impute the missing data after I spilt them into training and testing sets to prevent the double dipping. So the R code simple will be as follow:
  • Impute missing values in the training dataset (mltrain) only:
  • mltrain2 <- DMwR::knnImputation(mltrain)
  • Impute missing values in the testing dataset (mltest) using a data frame (here the training dataset) containing the data set that should be used to find the neighbours
  • mltest <- DMwR::knnImputation(mltest,distData = mltrain)
In MATLAB, I tried to use (knnimpute) on the training and testing datasets seperatly in the same way as the R code above, however, there is no option to pass the training data frame during the imputation of the missing values of the testing dataset.
Any suggestion on how to solve this issue?
Sincerely
Salim AL-Wasity

Answers (1)

Aditya Patil
Aditya Patil on 24 Dec 2020
Currently this functionality is not available in knnimpute. I have brought this request to the notice of concerned developers. It might be considered in any of the future releases.
As a workaround, you can train regression models on training data, and use them to predict missing values in the test dataset. Mulitple models might be required if data is missing in multiple columns.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!