I have next to no knowledge of statistics, so please bear with me.
I have a large dataset of three physical parameters, A, B and C. Established convention based on multiple empirical studies is that A is a linear function of B. When I plot my data as a scatter plot of A against B and shade each point according to C, the linear relationship between A and B is clear, but it also appears that A is dependent on C (or at least that higher values of A correlate with higher values of C for the same value of B).
What I now need to do is: (a) quantify the empirical relationship of A as a function of both B and C; (b) provide some indication of confidence that C genuinely is influencing A e.g. demonstrate a very low probability that A is fully independent of C; and (c) quantify the improvement in the accuracy of the predicted value of A when described as a function of B and C rather than solely as a function of B.
My knowledge of statistics is woefully lacking for this task. I have heard of multiple linear regression and ANOVA, but have never attempted either and don't really know what they are or how they differ.
I'm hoping that by defining the objectives as clearly as I can, someone will be able to point me in the right direction as to which tools to use and how to apply them.
One final bit of information that may be relevant, my sample sizes for A, B and C run into tens of thousands of measurements and the observations are (roughly) coincident in space and time, so there's no shortage of data. I may want to sub-sample to explore whether the relationship is affected by a fourth environmental factor, but even then I will have thousands of observations for each sub-sample.
All thoughts / comments / suggestions welcome.
Regards, Brian
0 Comments
Sign in to comment.