# Too high correlation value from xcorr and corrcoef for uncorelated sequences

7 views (last 30 days)
Sepp on 26 Aug 2018
Commented: dpb on 5 Apr 2019
Hello
I have two time series both of length 5604. The data is in the attachment (data.csv). The first column is the first time series, the second column is the second time series.
Now I have calculated the correlation of the series in the following way:
a = in(:,1);
b = in(:,2);
[corr, pval] = corrcoef(a,b);
corr= min(corr(:));
pval = min(pval(:));
[corr2, ~] = xcorr(a,b,'coeff');
corr2 = max(corr2);
For corr and pval I'm getting 0.5 and 0, respectively, and for corr2 I'm getting 0.92. But when I'm looking at the plot of both time series there should be no correlation at all (see plot below).
Why do I getting such high correlation values (and such low p-value)? dpb on 26 Aug 2018
Edited: dpb on 26 Aug 2018
"Visually they do not correlate."
But they do if you scale them such that can see it...
"...the blue one has much more variations."
Again, look to the definition of correlation coefficient; it's an OVERALL measure for the entire series, NOT a local measure of structure within the series. The absolute magnitude of the variation is almost completely immaterial as both A and B are standardized; each term in the summation is (Xi-mu)/std(X).
I illustrated the general correlation by arbitrarily scaling below; the actual scaling can be shown as
plot((dat-mean(dat))./std(dat)) which makes it very clear and as well as just shifting the mean as I did earlier amplifies the details in the red trace to show that spikes in it are in consonance with those in the blue trace a fair amount of the time so there will be some correlation (albeit small) even after correcting for the trend. But you can clearly see why the original series do have moderate correlation.
As far as the last Q?, see Answer posted earlier to remove linear trend, there's some indication of perhaps higher order but I didn't pursue to see if able to show statistically significant or not.

dpb on 26 Aug 2018
Ah, but there is quite a lot of correlation; remember it's not the magnitude of the values that matters, it's only whether they tend to "move together". To see this, try
figure, subplot(2,1,1)
yyaxis left, plot(dat(:,1))
yyaxis right,plot(dat(:,2))
ylim([0.35 1.5])
to overlay the two data series on top of each other at a scale factor that makes them roughly match each other in mean amplitude. What this shows is pretty much the same overall trend and that even some of the substructure is similar; particularly there being a drop towards the RH end around 5000. But, mostly what the correlation is measuring is the overall trend.
To see this,
subplot(2,1,2)
dtr=detrend(dat);
plot(dtr)
corr=corrcoef(dtr(:,1),dtr(:,2))
corr =
1.0000 0.1071
0.1071 1.0000
which is probably much more like what you were expecting. ##### 2 CommentsShow 1 older commentHide 1 older comment
dpb on 5 Apr 2019
"By default, xcorr computes raw correlations with no normalization:"
Use one of the optional 'scaleopt' arguments; 'coeff' is probably the one you're thinking of with max of 1 at zero lag.

Hii!
I am writing a matlab Code to check the similarity between 2 ECG signals. I have filtered the ECG signals by removing the high frequency noise and removing the baseline wandering.
I have used Xcorr to check the similarity between the two ECG signals. In the y axis I am getting some values greater than 1.
Can someone please let me know what the values on y axis are?
I have attached my correlated output signal. 