# How to perform linear curve fitting based on distance from line instead of residuals

9 views (last 30 days)

Show older comments

Damon Aboud
on 5 Jul 2022

Answered: Image Analyst
on 18 Jul 2022

##### 0 Comments

### Accepted Answer

John D'Errico
on 5 Jul 2022

Edited: John D'Errico
on 5 Jul 2022

This is the classic total least square problem. Sometimes it is called the errors in variables problem, sometimes known as orthogonal regression.

Lacking your data, I can't show how to do it using your data. But it is not difficult to do. Assume you have data in the form of vectors x and y. I'll make some up, here:

t = rand(100,1);

x = t/10 + randn(size(t))/10;

y = 5*t + randn(size(t))/10;

plot(x,y,'o')

axis equal

The actual slope of the curve should have been 50. What happens when we use polyfit?

P1 = polyfit(x,y,1)

The problem is, polyfit always produces biased estimates in these problems, where the slope will be biased towards zero. Do you see the polyfit extimated slope is very small? Instead of 50, we got a number around 2.9081 from polyfit.

The trick is to use total least squares. We can get the coefficients of the line from:

mux = mean(x);

muy = mean(y);

[~,~,V] = svd([x(:)-mux,y(:)-muy],0);

coef = V(:,end)

AEst = -coef(1)/coef(2)

So despite the rather high presence of noise on our data, we get a slope estimate 56.7846, instead of the 2.9 value that polyfit predicted. That seems pretty reasonable.

The constant term in a regression model will be:

BEst = muy - AEst*mux

The line of best orthogonal fit is:

y = 0.0527 + 56.7846*x

### More Answers (5)

Sulaymon Eshkabilov
on 5 Jul 2022

You can try a scaling factor here. Your data seems to be within [355 365]; thus, take out 355 from x and then perform the linear fit.

You can also try fitlm() that is very similar to polyfit().

##### 0 Comments

Torsten
on 5 Jul 2022

Edited: Torsten
on 5 Jul 2022

The residuals are the sum of the vertical distances of the points from the line.

If you want to consider the orthogonal distances of the points from the line, you can proceed as follows:

Calculate the distance d_i of a point P = (x_i,y_i) from a line y = a*x+b. If you don't know how to do this: google is your friend.

Once you have this distance, you can formulate the optimization problem as

min_{a,b} sum_i d_i

##### 0 Comments

Torsten
on 6 Jul 2022

Edited: Torsten
on 6 Jul 2022

rng("default")

t = rand(100,1);

x = t/10 + randn(size(t))/10;

y = 5*t + randn(size(t))/10;

plot(x,y,'o')

sol = fminsearch(@(p)fun(p,x,y),[1 1])

fun(sol,x,y)

hold on

plot(x,sol(1)*x+sol(2))

function res = fun(p,x,y)

a = p(1);

b = p(2);

for i = 1:numel(x)

xproj = (x(i)+a*(y(i)-b))/(1+a^2);

yproj = a*xproj+b;

dsquared(i) = (xproj-x(i))^2+(yproj-y(i))^2;

end

res = sum(dsquared);

end

##### 1 Comment

Image Analyst
on 18 Jul 2022

##### 0 Comments

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!