How to compute variance-covariance matrix inv(X'X) when X'X is rank-deficient? Does it make sense to use the pseudo-inverse as an estimate?

Question

Isabel Chen on 22 Dec 2015

1
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/261382-how-to-compute-variance-covariance-matrix-inv-x-x-when-x-x-is-rank-deficient-does-it-make-sense-to

Edited: John D'Errico on 25 Dec 2015

Suppose the design matrix X is rank deficient. Regression coefficients can be found based on the minimum-norm solution using the pseudo-inverse pinv. To get confidence intervals, we need standard error estimates which are the diagonal entries of inv(X'X). How does one compute this if X is rank deficient? pinv(X'X) gives reasonable solutions, but are they meaningful?

2 Comments
Show NoneHide None

Ryan Povall on 24 Dec 2015

My understanding is that you are trying to extract some data by taking the inverse of the covariance matrix X'X, but it is non-invertible and you want to know if pseudo-inverse provides an approximation. According to the following article, this seems to be the case. If I did not fully answer your question, I hope this helps in moving in the right direction.

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5601741&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F34%2F5770288%2F05601741.pdf%3Farnumber%3D5601741

John D'Errico on 24 Dec 2015

As I point out in my answer, it does provide some information, essentially on the reduced problem. However, it also is a lie to attempt to use the pseudo-inverse result here if you wish to compute standard errors. They are simply not valid as standard errors.

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 24 Dec 2015

3
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/261382-how-to-compute-variance-covariance-matrix-inv-x-x-when-x-x-is-rank-deficient-does-it-make-sense-to#answer_204219

Edited: John D'Errico on 24 Dec 2015

Open in MATLAB Online

No. It does not give reasonable solutions!!!!!!!!!!!

As I said repeatedly to the last question where you asked this, when you have a rank deficient problem, the standard errors on those coefficients will generally be infinite.

Again. I'll go through this once more as an example to show why using the pseudo-inverse gives meaningless results. Consider the simple problem of estimating c1 and c2 from this model:

c1*x + c2*x = x

Yes, I know that the solution is a trivial one. The answer is

c1 = c2 = 0.5

Or, is it

c1 = 1, c2 = 0

Or, is it

c1 = 1e6, c2 = 1 - 1e6

The point is, ANY solution is equally valid. All that matters is that c1+c2=1. The pseudo-inverse is nice here, because it gives you a nice, comfortable solution. Small numbers, that seem reasonable. But perhaps you are now confused, because using pinv was such an easy replacement.

The actual values are COMPLETELY unknown. As such, you would expect that the predicted standard errors on those parameters is essentially infinite. That infinite predicted variance is a reflection of the complete uncertainty on those parameters, because the estimator does not have sufficient information provided to give a unambiguous result.

That pinv will give you the minimum norm solution, with c1=c2=0.5 is not relevant. The statistical estimator that uses the diagonal elements of inv(X'*X) makes an assumption that the parameters are not perfectly correlated. However, in the case above, they are. PERIOD. Replacing inv with pinv does not make a nonsense estimator suddenly a valid one in this case. I think you do not fully understand the formulas that you are trying to use, and are thinking you can arbitrarily change them by replacing inv with pinv.

Yes, If the design matrix is non-singular, then the use of pinv versus inv will give you the same result. This has you confused, thinking that you can use pinv when the matrix is singular. YOU CANNOT DO SO HERE. Just throwing pinv at this problem here does not make for a meaningful result.

Think about it like this. inv(A'*A) yields what you can think of as a covariance matrix around the parameters, WHEN X is non-singular. (There is some stuff in front that scales it properly, but is irrelevant to this conversation.)

When A is singular, it yields essentially infinite standard errors. This is good, since the uncertainty on those parameters is infinitely wide, so they must have huge standard errors. If you use pinv here instead, it essentially lies to you, because the statistically correct prediction for that standard error really was infinitely large.

Of course, if your goal really is to fudge your results to look good (I presume it is not) then feel free to use pinv. But then, why not just make up the standard error that you want to see? I recommend using 17 here, mainly because I like the number 17. 0.17 might be more accurate though. Magic only works for Harry Potter. For the rest of us muggles, sorry, but mathematics rules. And, NO, I am not suggesting that you really want to fudge your results. I think you are just praying that there is some trivial solution to your problem. But if you do use pinv as you wish to do, you would be essentially fudging the results, yielding a completely incorrect prediction of the standard errors.

You have a singular matrix. That it is a singular matrix means you have insufficient information to estimate the parameters uniquely. You could make an effort to get better data, which is ALWAYS the very best solution for insufficient information.

If you do not do that, then at best you can gain information about the uncertainty around the sum, c1+c2. (In the example problem I posed.) You CAN indeed compute a non-infinite estimate of the standard error of that sum. In fact, this computation would involve use of the singular value decomposition (which is at the core of pinv.) What I do not know is if that information would be of any value at all to you.

The idea behind this is that you can re-write the problem I gave above as:

(c1 + c2)*x = d1*x

where d1 is still an unknown. This recognizes that we really don't have two independent parameters to be estimated, but that only one piece of information is available from our data. We CAN estimate the standard error of d1, and it will be finite. For any problem with a singular matrix, in theory, we can always reduce the problem into one with a non-singular design matrix, eliminating one (or more) linear combination of the parameters to reduce the problem to one that does have a unique solution. Again, I have no idea if this might be of any value.

3 Comments
Show 1 older commentHide 1 older comment

John D'Errico on 24 Dec 2015

Edited: John D'Errico on 24 Dec 2015

Open in MATLAB Online

I'm not being antagonistic, merely frustrated. In fact, I even up-voted your last question, a rare thing for me, since I like to see when someone is interested. I've also invested a LOT of time answering your questions, because you ARE interested. However, you simply won't accept that pinv is not a good choice here. You want to get a magic result from something that is not there. What I do NOT want you to do in any way, shape or form, is to somehow wander off from here thinking that you can just use pinv as you wish to do. If you then used it to report standard errors from that result, they would be wildly inappropriate - flat out invalid.

By the way, pinv(A'*A) is a bad thing to do numerically, since there are better ways to do that computation. If we compute the singular value decomposition for A...

[U,S,V] = svd(A);

then we can write A'*A as

A'*A = V*S'*U'*U*S*V'

But U is an orthogonal matrix, so U'*U is the identity matrix, and S is a diagonal matrix.

A'*A = V*S^2*V'

The inverse of this matrix is easy to compute when S has non-zero elements on the diagonal. When S has zero elements, then the pseudo-inverse simply drops them from the "inverse", also effectively killing off the corresponding singular vectors. So instead of using 1/0 as inf, it just zeros out those infs. This is where the problem arises here. When you have a singular matrix, then those essentially infinite standard errors just magically go away.

But the point is, you COULD compute the result of that pseudo-inverse much more accurately by computing the svd of A directly, not by computing A'*A and THEN applying pinv. All of that is moot, but lets see what we CAN do, what we can learn.

One of the things that the SVD can teach you on this problem is where the singularity lies. I'll return to that basic example from before, and maybe add one extra variable to make it slightly closer to what a real life problem might look like.

x = randn(100,1);
y = randn(100,1);
A = [x,x,y];

Clearly this will result in a singular system. The first two columns of A are identical. So the coefficients of those terms will be hopelessly and irretrievably confounded. Again, better data would help, but if we cannot, then what can we learn?

I'm ignoring the right hand side of the problem here, since this is just a question of information content in the independent variables. Lets compute the SVD of A. We can ignore U in this too.

[U,S,V] = svd(A,0);
S
S =
     13.543            0            0
          0       9.8118            0
          0            0   3.9144e-15
V
V =
     0.7053    -0.050523     -0.70711
     0.7053    -0.050523      0.70711
    0.07145      0.99744  -2.0817e-17

Ok, so what does this tell us? The essentially zero singular value, S(3,3) tells us that there is a linear combination of the variables that has NO information provided to us. So, look at V(:,3), the corresponding singular vector. We can arbitrarily scale it so the first element is 1.

V(:,3)/V(1,3)
ans =
          1
         -1
 2.9439e-17

Since it corresponds to a "zero" singular value, it tells us that the linear combination c1-c2 has no information provided in the data. (Ignoring that essentially insignificant zero element.) In the analysis, we have provided information about the values of the coefficients, but we can simply never learn anything about c1-c2 from this data. Essentially, if we implicitly rewrote the model in some form like this:

(c1+c2)x + c3*y + (c1-c2)*x = z

then we could in theory be able to estimate a standard error for (c1+c2) and c3. But for the value of c1-c2 we would have no information available. It would essentially have an infinite standard error.

All of this often gets a bit less easy to appreciate for a real life model, but it is no less true. So, for example, suppose your design matrix actually comes from trying to fit a 20th order regression polynomial. There would be one (or more) intellectually meaningless linear combination(s) of the coefficients that would have no meaningful standard error associated with it(them). This is also why I'm not sure if where this analysis is leading us would actually be of any benefit to you.

At most, we can learn from the matrices S and V that a rational model for this problem is actually of the form

d1*x + d2*y = z

but that was also clear from the way we constructed the data.

You should also note that in this last example, of a matrix with three columns, that the first column of V did not appear as a perfect multiple of [1;1;0]. That was because in my random sample, x and y were not perfectly independent in terms of a sample correlation.

V(:,1)/V(1,1)
ans =
          1
          1
     0.1013
corrcoef([x,y])
ans =
          1     0.054518
   0.054518            1

John D'Errico on 25 Dec 2015

Edited: John D'Errico on 25 Dec 2015

Open in MATLAB Online

Perhaps I've been trying to explain this the wrong way. Maybe I need to show what happens as a limit.

Back to the simple example, with one twist.

x = randn(100,1);
delta = randn(100,1);
yfun = @(sigma) x + delta*sigma;
Afun = @(sigma) [x,yfun(sigma)];

So, now, we have a repeatable example, where we can create a matrix A that is arbitrarily poorly conditioned. When sigma is large, then the two columns of A will be quite different. When sigma approaches zero as a limit, then the columns become identical, and the conditioning of A becomes terrible.

cond(Afun(5))
ans =
        6.085
cond(Afun(1))
ans =
       2.3103
cond(Afun(.01))
ans =
          166
cond(Afun(.0001))
ans =
        16612
cond(Afun(.0000000001))
ans =
   1.6612e+10
cond(Afun(.00000000000001))
ans =
   1.6584e+14

The nice thing is that Afun here is completely repeatable. Choose the same value of sigma, and I will get a repeatable result. (Your result will be slightly different if your random seed was different from mine of course.)

Now, lets look at what happens with pinv(A'*A) in the computation as you wish to do it.

A = Afun(1);
inv(A'*A)
ans =
     0.018521   -0.0074295
   -0.0074295    0.0080437
pinv(A'*A)
ans =
     0.018521   -0.0074295
   -0.0074295    0.0080437

A you can see, both pinv and inv produce the same results. The columns of A, thus x and y, were quite different. So I'll just report the results of inv and pinv in one array.

[inv(A'*A),pinv(A'*A)]
ans =
     0.018521   -0.0074295     0.018521   -0.0074295
   -0.0074295    0.0080437   -0.0074295    0.0080437

See what happens as we make sigma smaller. See that pinv and inv agree. The standard errors from those diagonal elements will start to grow, but this is EXACTLY what we should expect.

A = Afun(.1);
[inv(A'*A),pinv(A'*A)]
ans =
      0.80379     -0.79822      0.80379     -0.79822
     -0.79822      0.80437     -0.79822      0.80437
A = Afun(.001);
[inv(A'*A),pinv(A'*A)]
ans =
       8042.5      -8043.1       8042.5      -8043.1
      -8043.1       8043.7      -8043.1       8043.7
A = Afun(.00001);
[inv(A'*A),pinv(A'*A)]
ans =
   8.0437e+07  -8.0437e+07   8.0437e+07  -8.0437e+07
  -8.0437e+07   8.0437e+07  -8.0437e+07   8.0437e+07
A = Afun(.0000001);
[inv(A'*A),pinv(A'*A)]
ans =
   8.4782e+11  -8.4782e+11   8.4336e+11  -8.4336e+11
  -8.4782e+11   8.4782e+11  -8.4336e+11   8.4336e+11

As you can see, both inv and pinv are agreeing here. Things are going to hell as far as our ability to gain any useful information.

Finally, it gets to the point where inv gives up the ghost. But what happens here?

A = Afun(.00000001);
[inv(A'*A),pinv(A'*A)]
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =  4.142154e-17. 
ans =
  -7.0369e+13   7.0369e+13    0.0029148    0.0029148
   7.0369e+13  -7.0369e+13    0.0029148    0.0029148
A = Afun(.0000000000001);
[inv(A'*A),pinv(A'*A)]
Warning: Matrix is singular to working precision. 
ans =
          Inf          Inf    0.0029148    0.0029148
          Inf          Inf    0.0029148    0.0029148

See that the inverse computation has finally gone completely to hell, but suddenly, as far as pinv is concerned, everything is rosy! All is good in the world. Jut before that point, pinv was telling you that those diagonal elements were massive, but with a tiny change to sigma, making it just a bit smaller, pinv changed its mind! In fact, we know that pinv has produced the completely wrong result. Inv was giving you valid results all the way, in the sense that those diagonal elements were exploding. Even up to the very end, where it said your ability to estimate a standard error here was completely in the crapper.

Again, this is why you CANNOT use pinv as you so fervently desire. It produces what is essentially a lie at the end. Just when you think you need to use pinv, it will lie! Pinv is a GREAT tool when used to solve a singular or near-singular linear system. It is a TERRIBLE tool when used in the wrong place with no understanding of why that replacement for inv is a bad thing.

Sign in to comment.

How to compute variance-covariance matrix inv(X'X) when X'X is rank-deficient? Does it make sense to use the pseudo-inverse as an estimate?

2 Comments
Show NoneHide None

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

How to compute variance-covariance matrix inv(X'X) when X'X is rank-deficient? Does it make sense to use the pseudo-inverse as an estimate?

2 Comments Show NoneHide None

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment