How to find out the distribution of missing data?

Question

Rita on 9 Sep 2016

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/302532-how-to-find-out-the-distribution-of-missing-data

Answered: John D'Errico on 9 Sep 2016

I have a vector with some missing data .I am not sure how I can realize the distribution of missing data?I know that I can use "distfit"function to find out the distribution of data but how about missing data?Thanks for any advice.

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 9 Sep 2016

Pretty much you can't know what you don't know...unless there's some correlating variable that isn't missing, pretty much the trouble with missing data is that it is, well, "missing"...

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 9 Sep 2016

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/302532-how-to-find-out-the-distribution-of-missing-data#answer_234317

Open in MATLAB Online

So, some of the elements of the vector are NaNs, or something like that?

If the elements of this vector are presumed to be just random numbers, from some unknown distribution, then you can use fitdist, not distfit. In fact, fitdist is smart enough to ignore them anyway.

x = randn(100,1);
x([2 3 5 7 11]) = NaN;
fitdist(x,'normal')
ans = 
  NormalDistribution
    Normal distribution
         mu = -0.114351   [-0.324369, 0.0956665]
      sigma =   1.03096   [0.902313, 1.20273]

If by "missing", you are trying to imply the missing elements are in some way related to their neighbors, then the vector is not simply a random sample from some distribution. In that case, you can only use some scheme to interpolate or approximate the missing values from their neighbors. If the vector also has noise in it, then interpolation can be a noise amplifying process. For example, a cubic spline interpolant, applied to noisy data will actually be a worse estimator than a linear interpolant, in the sense that the variance of the interpolated values will be higher than that which you would achieve from a linear interpolant. A smoothing spline of some ilk, applied to the non-missing elements might then be a good choice.

But without a clearer definition of your problem, it seems very difficult to provide a better answer.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to find out the distribution of missing data?

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to find out the distribution of missing data?

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments