MATLAB Answers

Multivariate analysis advice needed

1 view (last 30 days)
I have next to no knowledge of statistics, so please bear with me.
I have a large dataset of three physical parameters, A, B and C. Established convention based on multiple empirical studies is that A is a linear function of B. When I plot my data as a scatter plot of A against B and shade each point according to C, the linear relationship between A and B is clear, but it also appears that A is dependent on C (or at least that higher values of A correlate with higher values of C for the same value of B).
What I now need to do is: (a) quantify the empirical relationship of A as a function of both B and C; (b) provide some indication of confidence that C genuinely is influencing A e.g. demonstrate a very low probability that A is fully independent of C; and (c) quantify the improvement in the accuracy of the predicted value of A when described as a function of B and C rather than solely as a function of B.
My knowledge of statistics is woefully lacking for this task. I have heard of multiple linear regression and ANOVA, but have never attempted either and don't really know what they are or how they differ.
I'm hoping that by defining the objectives as clearly as I can, someone will be able to point me in the right direction as to which tools to use and how to apply them.
One final bit of information that may be relevant, my sample sizes for A, B and C run into tens of thousands of measurements and the observations are (roughly) coincident in space and time, so there's no shortage of data. I may want to sub-sample to explore whether the relationship is affected by a fourth environmental factor, but even then I will have thousands of observations for each sub-sample.
All thoughts / comments / suggestions welcome.
Regards, Brian

  0 Comments

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 5 Jul 2016
I would use the Statistics and Machine Learning Toolbox regress function. It should give you everything you need.
I would ask for at least the first two outputs:
[b,bint] = regress(y,X);
The ‘bint’ matrix will give the confidence intervals (these are 95% by default) for each parameter. This will tell you if both parameters are needed in the regression, since if the confidence interval includes zero (shortcut is that the confidence limits are of opposite signs), that parameter is not significantly different from zero, and is not needed in the regression. Otherwise, the estimated parameter is significantly different from zero, and must be kept in the regression.
An alternative (or additionally) if you want to test which of the parameters are needed in the linear regression is the stepwisefit function.
I would review the documentation for both, and see which one best fits your needs.

  2 Comments

Brian Scannell
Brian Scannell on 6 Jul 2016
Thanks Star Rider - much appreciated.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!