Evaluation Criteria for Missing Data Imputation Techniques

Question

Tiago Dias on 28 Jun 2018

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/407885-evaluation-criteria-for-missing-data-imputation-techniques

Answered: Tiago Dias on 5 Jul 2018

Hello,

I have 5 methods for missing data imputation, since my original data set, has missing values due to the fact that is industrial data. And to perform a PCA analysis, and in order to have eigenvalues positives, I need a covariance to be determine positive.

I use the 5 methods to impute missing data, so now i got 5 new matrices of X_imputed.

Question: How can measure the performance of each one? what criteria should I use?

I read about calculation RMSE, but when I see the formula they use SQRT of Xi obs - Xi imputed, and they do the calculation because their initial X is complete, and they introduce a % of MD, but the problem for me is that i already start with Missing Data.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Jeff Miller on 4 Jul 2018

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/407885-evaluation-criteria-for-missing-data-imputation-techniques#answer_327372

Open in MATLAB Online

You can't evaluate the performance of the different imputaton methods with respect to your actual data set, for exactly the reason you mention. You can only compare their performance across simulations where you know the values of each of the missing points (i.e., your simulation pretends that some simulated points are missing). Such a simulation would require very detailed assumptions about the multivariate situation that your data came from, including the reasons why some points are missing.

It might be better to perform the PCA without imputing any missing data (check the pca documentation). Did you try

coeff = pca(X,'Rows','pairwise');

This essentially computes each entry in the covariance matrix using whichever of your original data rows/cases have values for both relevant variables.

2 Comments
Show NoneHide None

Tiago Dias on 4 Jul 2018

Open in MATLAB Online

Thanks for your input, but I need to impute the missing data. Sice I got missing values (~30%, industrial data) i can make the calculation of the covariance, but since the covariance got NaN's, I can't calculate scores and loadings.

Since I got my matrix X and my matrix Ximputed (using a PCA model, so all the entry are re-calculate, even the non missing values) I can perform a

sum((X(i,j) - X_imp (i,j)).^2) has a criteria?

Jeff Miller on 5 Jul 2018

Sorry, I do not know whether your suggestion is reasonable or not.

If the data do not even allow the covariances to be estimated, then you probably don't have enough data to decide which is the best imputation method or to do PCA afterwards.

Can you select out a subset of the variables for which you can get a complete set of covariances? You might just do PCA on this subset.

Sign in to comment.

Answer 2

Tiago Dias on 5 Jul 2018

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/407885-evaluation-criteria-for-missing-data-imputation-techniques#answer_327561

I can't really make a subset, because all variables have missing data. But I found an article when they do the residues from X(with MD) - Ximputed, just for the i,j that are values in X, so I go that way.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Evaluation Criteria for Missing Data Imputation Techniques

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Evaluation Criteria for Missing Data Imputation Techniques

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

2 Comments Show NoneHide None

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments