## How do I determine goodness of the fit (any curve )when using least square method?

### Raju (view profile)

on 10 Oct 2019
Latest activity Edited by John D'Errico

### John D'Errico (view profile)

on 10 Oct 2019
I am trying to fit my data as,
% %Curve fit
fun = @(p,xdata) p(1).*(1-exp(-xdata./p(2))) + p(3).*(1-exp(-xdata./p(4)));
[pb1,resnorm,residual,exitflag,output,lambda,jacobian] = lsqcurvefit(fun,[500 500 500 500],x,y);
xm=0:max(dose);
y=fun(pb1,xm);
plot(xm,y,'-r');
hold on;
How do I find the goodness of the fit?

### Tags ### Image Analyst (view profile)

on 10 Oct 2019

How about simply taking the sum of the residuals?
residualSum = sum(abs(yFitted - yTraining));

Raju

### Raju (view profile)

on 10 Oct 2019
Did you mean 'resnorm' in the above fit? Could you explain a bit more about what you want to say? Thank you. ### John D'Errico (view profile)

on 10 Oct 2019
Edited by John D'Errico

### John D'Errico (view profile)

on 10 Oct 2019

There are many measures of goodness of fit for a least squares problem. NONE of them are perfect. As well, there are entire graduate level statistics classes that will discuss these things, with large text books for side reading. You will benefit from a text like Draper & Smith.
A measure of goodness of fit is often something like RMSE. Think norm of the residuals. Image Analyst mentions sum of absolute values of errors, but that is just one norm. Perhaps more common is the 2-norm of the residuals, which weights the large residuals more heavily. Large residuals are bad, so the 2-norm MAY be a better measure then th sum of absolute values. (However, if your data is corrupted with outliers, then the sum of absolute values gives relatively less weight to the corrupted data, so it may be better. Only you know what is the case for you.)
Next are measures like R^2. That tells you how much better your model fits the data, compared to a simple constant model. It is a simple measure that can be easily confused, just like any other one number measure. For example, R^2 should never be negative, or greater than 1, right? Well it can, if your model lacks a constant term. (I can't recall at the moment which way R^2 will fail then, but that does not really matter here.) So there are adjusted R^2 measures you can use.
There are tools you can use to determine such measures of fit. For example compute the residuals, then use the norm function. Or compute the mean of the sum of squares of the errors, then take the square root. If you are comparing models with various numbers of parameters, then consider the number of degrees of freedom in your model. So when you compute that mean of the squuared errors, divide by the number of data points, less the number of parameters estimated.
Tools like regress from the stats toolbox can be a big help of course. But many tools (polyfit, or those in the curvefitting toolbox) also provide at least some such measures of fit for you. And there are other measures of goodness of fit, of course.
Anyway, as I said, ALL such measures are flawed. If you look only at a number to tell you if the fit is ok, then you are making a MISTAKE.
My point is, instead of worrying about a measure of goodness of fit, the only measure that matters is inside your own brain. Plot the data. Plot the cuve fit on top. If it looks good enough for your purposes, then who gives a tinker's damn about any measure? Look at the residuals. Are they as small as you need them to be? If not, then you need either a better model or you need better data, and measures be damned. Note that the better model may be impossible to find, if the fundamental problem is too much noise. Do the residuals have a pattern to them, suggesting lack of fit to the model? If so, then you may want to look for a better model. Is the noise homogenous in structure? (If not, then why in the name of god and little green apples are you using least squares for the fit, which presumes an implicit normally distributed, homoscedastic error strucure? At least you need to be using a weighted model, but other fitting schemes may be appropriate.You may need to use a robust scheme to alleviate outliers, etc.) But only you know if any deviations are important to you.
Do you have any idea of what noise you expect to see? If you replicated a data point, how much noise would there be in it? This can help you to decide if you are under-fitting the data or overfitting it. Again, look for patterned residuals, to help you decide if there is significant lack of fit.
So what matters in the end are your goals for the modeling process. What will you do with it in the end? How good does it need to be?
If you learn one thing from my response, it is that no single numeric measure is as good as the one inside your brain.