How to fit a quadratic function using the "minimizing the volume-weighted mean squared error"?

Say I have a function y = a + b*x + c*x.^2 and I got matrixes x and y, and the weight w.
Estimate the parameters ( a, b and c) by minimizing the volume-weighted mean squared error?
i.e. to minimize [ sum (w. * ( y - yhat ).^2) / (sum w) ]

 Accepted Answer

u = sqrt(w(:));
b1 = u;
b2 = u.*x(:);
b3 = u.*x(:).^2;
f = u.*y(:);
A = [dot(b1,b1) dot(b1,b2) dot(b1,b3);
dot(b1,b2) dot(b2,b2) dot(b2,b3);
dot(b1,b3) dot(b2,b3) dot(b3,b3)];
v = [dot(b1,f); dot(b2,f); dot(b3,f)];
p = A\v; % p(1) = a, p(2) = b, p(3) = c
Hope I haven't made mistakes. :)

8 Comments

Thank you for answering so fast! But to my opinion there should be some kind of loops to minimize the formula while the answer you gave is like OLS to me.
According to your description, it is a basic linear least squares problem, and it can reduced to the solving of the linear equations Ap = v. No iterations needed.
Correct, in that no loops are required. The solution shown here is numerically poor though.
Instead, you can solve the entire weighted OLS problem with far less effort.
As you have written u,b1,b2,b3, do this:
p = [b1,b2,b3]\(u.*y(:));
This is far more robust to numerical problems. Use of backslash directly is the proper way to solve the problem. Do not use the equation as Siyu has shown though.
Yeah, I think John's right, though I haven't yet encountered numerical problems caused by the textbook solution I provided. It is likely that dot(bi,bj) give numerically poor results, and by John's statement, MATLAB must have provided a proper treatment for these potential drawbacks.
The case where there will be problems is when the system of equations to be solved are ill-conditioned. It is not difficult for that to happen, and when it does, the form you are solving squares the condition number of the system. So even a system that is somewhat ill-conditioned now becomes singular. Admittedly, low order polynomials ( linear or quadratic, for example) are usually pretty stable.
As far as what causes the problem, it is NOT the use of dot!!!!!!!!! It is the way the linear system was formulated in the first place. Just because you might call this a textbook solution, does not mean that every textbook was well written. In fact, if you found that in a textbook, I'd recommend throwing that text out, as it gives you poor advice. Sigh. Let me try to explain. (The use and inclusion of weights or not is irrelevant here, so I won't bother with them.)
x = [1;1.0000001;2;2.0000001];
y = 1 + 2*x + 3*x.^2;
I will now try to fit a quadratic polynomial to this data using two different methods. What matters here is the vector x. I chose it so that there are essentially TWO data points, two near replicates. So really, there are two pieces of distinct information.
The linear system we need to solve is the over determined system given by:
A = [ones(size(x)),x,x.^2];
thus we need to solve the problem
A*coef = y
but A has 4 rows, and only three columns. It is not square.
So the poor "textbook" solution is to formulate the problem using the normal equations, as
(A'*A)*coef = A'*y
solving then for coef as
coef = (A'*A)\(A'*y);
That is what you wrote, if we ignore the dot products. A matrix multiply is just a whole slew of dot products after all.
Anyway, the problem is that while A is poorly conditioned, thus it has a large condition number,
cond(A)
ans =
2.5598e+08
cond(A'*A)
ans =
1.5827e+16
we see that A'*A is very poorly conditioned, and essentially singular.
coef = A\y
coef =
1
2
3
coefbad = (A'*A)\(A'*y)
Warning: Matrix is singular to working precision.
coefbad =
-Inf
Inf
NaN
As you can see, direct use of backslash on the matrix A yields reasonable results, while the form you used squares that condition number, making that poorly conditioned problem into one that has no solution at all.
Admittedly, the example I chose was kind of an extreme one, although entirely valid. But people do things as bad all the time without even knowing they have done so. Had I used a higher order polynomial, it would have been far easier to cause the linear algebra to have problems. A simple quadratic is at least moderately robust, which explains why you have not seen obvious problems in your own work.
That is, you have not seen problems so far. Give it time. And since there are far better solutions available no farther away than the direct use of backslash, why would you tempt fate?
I do not tempt fate. I simply didn't know this use of \. :)
I admit that I often seem to be ranting about the use of a few numerical methods, taught by textbooks, by teachers, in courses, etc. The problem is that numerical analysis has changed relatively rapidly over the last 50 years. We have learned much in that time. But there are still bad memes that propagate, and never seem to die out. The problem is that students are taught a bad numerical method. They are taught that by a teacher who learned the same thing, from a textbook or paper written by someone who did not know any better. And then the student grows up, into a teacher, a mentor, etc. What do they tell their own students? Of course, they teach what they know as "truth". That it is provably poor is irrelevant. But these memes propagate forever. Taught from one person to another by word of mouth, by text. etc.
It never stops unless someone is out there, trying to intercept the bad ideas from propagating, explaining why they are actively bad, and explaining that there is a good solution.
So I tilt at windmills...
I'm looking forward to a numeric methods textbook written by you, :)

Sign in to comment.

More Answers (1)

Asked:

on 30 Apr 2018

Commented:

on 1 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!