5 views (last 30 days)
ADSW121365 on 16 Aug 2019
Edited: Bruno Luong on 16 Aug 2019
Hey everyone,
I'm currently trying to get my head around how quadprog works to apply it to a problem I'm working on. In the current problem I'm looking to minimise a ridge regression problem, such:
H = 2(A'*A + lambda.*eye(size(A'*A));
where A is an overdetermined matrix of dimensions [m,n], m > n. H is non-symettric and has negative values however is a positive definite as tested by:
%H Must be Positive Definite to use quadprog check p = 0:
[~,p] = chol(H);
and f = -2*A'*b where b is an [mx1] matrix of measurements.
My confusion arises when reading the description of how quadprog works:
"x = quadprog(H,f) returns a vector x that minimizes1/2*x'*H*x + f'*x. The input H must be positive definite for the problem to have a finite minimum. If H is positive definite, then the solution x = H\(-f)."
Given H is positive definite, my expectation is then that x1 = quadprog(H,f) and x2 = H\(-f) would give identical solutions.
For my problem quadprog converges on a solution (exit flag =1) after 95580 iterations. However the solution is very different from x2 = H\(-f). For my current setup, I know the target norm for the solution, norm(x2) is very close to this (5.0703e6 compared to true norm = 5e6) whilst the norm(x1) = 2.1072e5 which is significantly smaller than required for a correct solution.
What's going on here? Am I misinterpreting the description for quadprog somehow?
Bruno Luong on 16 Aug 2019
Edited: Bruno Luong on 16 Aug 2019
Your A is pratically singular (you might miss to add boundary condition, or such when you discretize your problem).
Your regularization is too small to make any effective effect.
The two combined lead to "strange" thing that you observe.
Remember: condition number plays two roles:
• sensitive of solution due to noise, including numerical round-off noise
• bad convergence of iterative (conjugate graient) method where quadprog is based on
Anything system condition number >= 1e6 is pratically singular.
clear
caseid = 1; % or 2
switch caseid
case 1
testobject = zeros(size(A,2),1);
testobject(1:25,1) = 1;
case 2
A = rand(50,10);
testobject = rand(size(A,2),1);
end
B = A*testobject(:);
lambda = norm(A)*1e-6;
f = -A'*B;
resfun = @(x) norm(A*x - B)^2 + lambda * norm(x)^2;
%% backslash on A
backslash_x = [A; lambda*speye(size(A,2))] \ [B; zeros(size(A,2),1)];
backslash_residu = resfun(backslash_x);
%% backslash on A'*A
H = (A'*A + lambda.*eye(size(A'*A)));
Hdivided_x = H\(-f);
Hdivided_residu = resfun(Hdivided_x);
%% quandprog
options = optimset('Display', 'final-detailed','LargeScale', 'off','MaxIter',1000000);
close all
h1=plot(testobject)
hold on
h2=plot(backslash_x)
h3=plot(Hdivided_x)
legend([h1 h2 h3 h4],'exact','backslash','Hdivided','quadprog');
resid0 = resfun(testobject) % 3.0871e-11
backslash_residu % 2.5272e-11
Hdivided_residu % 3.3919e-14
qp_residu % 7.5741e-14 Matt J on 16 Aug 2019
Edited: Matt J on 16 Aug 2019
Using your posted A matrix I find that
>> cond(H)
ans =
7.5703e+18
For numerical purposes, this is a singular matrix and shouldn't be treated as positive definite (but rather positive semi-definite). The test you did with chol() isn't reproducible,
>> [~,p] = chol(H); p
p =
1
but even if p had been 0, it would be irrelevant. chol's judgement on whether a matrix is sufficiently positive definite to compute a Cholesky decomposition does not mean it is sufficiently positive definite for everything.
For quadprog, the only thing that matters here is that, due to the ill-conditioning of H, the equations H*x+f=0 will have a non-unique set of solutions that all approximately minimize the quadprog objective. You simply cannot be sure which of these solutions different algorithms will produce, with such an ill-conditioned H.
As for why the residual is so much worse for quadprog, that is likely because quadprog is an iterative solver and cannot use the residual to decide when to stop iterating (because it doesn't have access to A and B). In fact, it almost certainly uses something like a stopping tolerance on the gradient norm,
gradnorm = norm(H*x+f)^2 < tolerance
Since H*x+f=2*A.'*(A*x-B), we can see that the gradient norm can be significantly smaller than the residual vector r= A*x-B depending on the values of A.'

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!