14 views (last 30 days)

When undertaking a linear regression evaluating y as a function of x and x^3, is there a specific function within Matlab that takes account of the mutual dependence of the independent variables (x and x^3)?

I've tried searching the documentation but haven't found anything that specifically addresses this issue.

Regards, Brian

John D'Errico
on 24 Aug 2017

Edited: John D'Errico
on 24 Aug 2017

Um, yes. ANY regression computation takes into account the relation between the variables.

So backslash, regress, lscov, lsqr, fit,etc. All take that into account.

I think your issue is that you don't understand how regression works. For that, there are entire courses that are taught.

Yes, it is true that x and x^3 are correlated with each other. Note that mathematical linear independence is not the same as saying the two variables are not related. That is, it is true that no linear combination of a*x + b*x^3 is zero EXCEPT for the case where a=b=0. So x and x^3 provide different information to the problem. Yet at the same time, it is not true that x and x^3 are orthogonal. There is essentially some overlap in what they do.

Yes, it is also true that there may be numerical issues. But that relationship between the variables is factored in when the regression is done. I really cannot say much more without specifics, or without writing a complete text on linear regression myself. Better that you read one, since many have been published. Perhaps a classic like that by Draper and Smith would be a good choice.

John D'Errico
on 24 Aug 2017

I think you have gotten confused about regression, probably by a comment from a colleague. I seem to recall you saying that a colleague had said something about x and x^3, and now you seem to be worried.

While you should always beware of problems, odds are the inter-relationship between x and x^3 is not going to be an issue, at least if there are only two variables, and if you see no warning messages.

One test is to compute

cond([x(:),x(:).^3])

If you have other terms in the problem, they need to be included in there too. The best possible value here is 1. If you were seeing large numbers, REALLY large, on the order of 1e15 or so, you would start to get quite worried. Even 1e8 would be pretty bad. But for example, lets try it on some sample data.

x = randn(100,1);

cond([x(:),x(:).^3])

ans =

5.9516

So only 5.9. On the scale of how worried I would get here, 5.9 is laughably small.

Compare that to a different problem, with a much more complex model. Here, one with 16 polynomial terms in it.

x = rand(100,1);

cond([x.^[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]])

ans =

2.9659e+11

That is large enough that the coefficients may have have little real value. Any polynomial coefficients you estimate from that model would arguably be almost useless.

But for the model you have described? There is probably no big issue.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 1 Comment

## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/353753-regression-with-dependent-independent-variables#comment_479262

⋮## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/353753-regression-with-dependent-independent-variables#comment_479262

Sign in to comment.