Quadratic Programming (quadprog):
10 views (last 30 days)
I'm currently trying to get my head around how quadprog works to apply it to a problem I'm working on. In the current problem I'm looking to minimise a ridge regression problem, such:
H = 2(A'*A + lambda.*eye(size(A'*A));
where A is an overdetermined matrix of dimensions [m,n], m > n. H is non-symettric and has negative values however is a positive definite as tested by:
%H Must be Positive Definite to use quadprog check p = 0:
[~,p] = chol(H);
and f = -2*A'*b where b is an [mx1] matrix of measurements.
My confusion arises when reading the description of how quadprog works:
"x = quadprog(H,f) returns a vector x that minimizes1/2*x'*H*x + f'*x. The input H must be positive definite for the problem to have a finite minimum. If H is positive definite, then the solution x = H\(-f)."
Given H is positive definite, my expectation is then that x1 = quadprog(H,f) and x2 = H\(-f) would give identical solutions.
For my problem quadprog converges on a solution (exit flag =1) after 95580 iterations. However the solution is very different from x2 = H\(-f). For my current setup, I know the target norm for the solution, norm(x2) is very close to this (5.0703e6 compared to true norm = 5e6) whilst the norm(x1) = 2.1072e5 which is significantly smaller than required for a correct solution.
What's going on here? Am I misinterpreting the description for quadprog somehow?
Matt J on 16 Aug 2019
Edited: Matt J on 16 Aug 2019
Using your posted A matrix I find that
For numerical purposes, this is a singular matrix and shouldn't be treated as positive definite (but rather positive semi-definite). The test you did with chol() isn't reproducible,
>> [~,p] = chol(H); p
but even if p had been 0, it would be irrelevant. chol's judgement on whether a matrix is sufficiently positive definite to compute a Cholesky decomposition does not mean it is sufficiently positive definite for everything.
For quadprog, the only thing that matters here is that, due to the ill-conditioning of H, the equations H*x+f=0 will have a non-unique set of solutions that all approximately minimize the quadprog objective. You simply cannot be sure which of these solutions different algorithms will produce, with such an ill-conditioned H.
As for why the residual is so much worse for quadprog, that is likely because quadprog is an iterative solver and cannot use the residual to decide when to stop iterating (because it doesn't have access to A and B). In fact, it almost certainly uses something like a stopping tolerance on the gradient norm,
gradnorm = norm(H*x+f)^2 < tolerance
Since H*x+f=2*A.'*(A*x-B), we can see that the gradient norm can be significantly smaller than the residual vector r= A*x-B depending on the values of A.'