# extrapolation from correlated data

2 views (last 30 days)
Fan Hu on 2 Dec 2019
Answered: Image Analyst on 3 Dec 2019
Hi,
I've got a bit of a problem.
I've got 2 data sets, which are very highly correlated with one another. Knowing the future values of one set of data, I would like to extrapolate the other set. How would I go about doing this

Adam Danz on 2 Dec 2019
Hint:
If you're data are linear, the regression slope is the covariance between X and Y divided by the variance of X
m = cov(x,y)/var(x);
where a point (x0) would fall along the line at
y0 = m(2,1)*x0
assuming the y-intercept is at 0.

Show 1 older comment
Adam Danz on 3 Dec 2019
These look like they could be very noisy linear data with nearly flat slopes and a vertical offset between the two data sets. I'm not even sure if those data are rising and falling in synchrony.
If the orange and blue lines are variables y1 and y2, could you share the results of this plot:
plot(diff(y1), 'o')
hold on
plot(diff(y2),'-s')
grid on
Fan Hu on 3 Dec 2019 Adam Danz on 3 Dec 2019
Without knowing anything about the data, I see three trends that I'm not certain of.
1. Both data sets (orange and blue) seem to be noise that varies about a flat line (slope=0)
2. The difference between the trends is a vertical offset.
3. The blue seems to have a slightly larger modulation amplitude.
To estimate the vertical offset, I'd average the difference between the blue and orange.
vOffset = mean(orange - blue)
which should be a postive number (mayby around 300)
To estimate the gain in modulation, you could try
gain = mean(diff(blue)./diff(orange))
assuming the blue and orange data have the same number of data points. This should also be a positive value.
Given a new value of orange, the blue would be something like
% b0 is the new blue value
% g0 is the given orange value
b0 = g0*gain + vOffset
Of course none of this has been tested so you might have to play around with it.
You may also want to add noise to the estimate. Otherwise all of your estimate values will have a correlation of 1 but your real data is obviously not as highly correlated.

Image Analyst on 3 Dec 2019
Can you attach some data?
If the values are correlated, how about if you scatter one versus the other, then fit a line between them? Now if you have the future values of set #1, y1, then you must have the future times, tFuture. So what if you just extrapolate them? You can add some noise if you want.
coefficients = polyfit(y1, y2, 1); % Fit a line through the scatterplot ov y2 vs. y1.
y1Future = y1(tFuture); % Get the y1 values at the future times.
y2Future = polyval(coefficients, y1Future); % Predict the y2 values when y1 are these values.
plot(tFuture, y2Future, 'ms-'); % Plot the y2Future values vs. time.
That's just off the top of my head. Can you attach some data so we can see if it seems reasonable?