Do I have to correct correlation values for two series originating from overlapped analysis?

4 views (last 30 days)
Dear community,
I want to have a look into the correlation of two parameters. Since those parameters originate from an overlapped analysis (e.g. window of 100ms with 75ms overlap) I guess I have to correct for the fact that those parameters are based on doubled (or in the example 4 times) amount of data?
Am I mistaken and if not, how to correct the correlation estimate?
best regards
  1 Comment
Jonas
Jonas on 11 Jan 2024
Asking google bard, it proposes a correction by w^2 with w being the overlap factor (here e.g. 0.75). Is this a thing? I could not find a source discussing this outside of the AI model. This just seems wrong to me since towards the extreme cases the correaltion will be unaffected or be always 0

Sign in to comment.

Accepted Answer

Hassaan
Hassaan on 11 Jan 2024
Here's a conceptual approach to adjust the degrees of freedom in MATLAB:
% Assuming x and y are your time series data with overlap
n = length(x); % Actual sample size
overlap_proportion = 0.75; % For 75ms overlap in a 100ms window, for example
% Estimate the autocorrelation at the lag equal to the overlap
% This is a simple approach and may not be accurate for all cases
lag = round(overlap_proportion * n);
auto_corr_x = autocorr(x, lag);
auto_corr_y = autocorr(y, lag);
rho = mean([auto_corr_x(end), auto_corr_y(end)]);
% Adjust the effective sample size
n_eff = n / (1 + (2 * (n - 1) * rho));
% Now calculate the correlation
[r, p] = corr(x, y);
% Adjust the p-value for the effective degrees of freedom
t_val = r * sqrt((n_eff - 2) / (1 - r^2));
p_eff = 2 * tcdf(-abs(t_val), n_eff - 2);
% Display the adjusted correlation and p-value
disp(['Adjusted correlation: ', num2str(r)]);
disp(['Adjusted p-value: ', num2str(p_eff)]);
This code snippet provides a simple example of how you might adjust the degrees of freedom due to overlapping data when calculating the correlation. However, for more accurate autocorrelation estimation, you might need to employ more sophisticated time series analysis methods.
For time series analysis using ARIMA models, MATLAB provides the arima function. Here's an example of how you might use an ARIMA model to prewhiten your data and then calculate the correlation:
% Fit an ARIMA model to both time series
model_x = arima(1,0,1); % ARIMA(1,0,1) model for example
model_y = arima(1,0,1); % ARIMA(1,0,1) model for example
[estModel_x,~,logL_x] = estimate(model_x, x);
[estModel_y,~,logL_y] = estimate(model_y, y);
% Prewhiten the data by subtracting the fitted values
residuals_x = infer(estModel_x, x);
residuals_y = infer(estModel_y, y);
% Calculate the correlation of the residuals
[r_pw, p_pw] = corr(residuals_x, residuals_y);
% Display the prewhitened correlation and p-value
disp(['Prewhitened correlation: ', num2str(r_pw)]);
disp(['Prewhitened p-value: ', num2str(p_pw)]);
The advantage of using the ARIMA model (solution 3) over adjusting the degrees of freedom (solution 1) is that ARIMA models account for the time series structure in the data, including trends, seasonality, and autocorrelation. This can potentially provide a more accurate estimate of the correlation by explicitly modeling and removing these components. The disadvantage is that it can be more complex and requires the selection of an appropriate ARIMA model, which is not always straightforward and might require statistical expertise.
---------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.

More Answers (1)

Hassaan
Hassaan on 11 Jan 2024
When you calculate the correlation between two parameters with data from overlapping windows, the autocorrelation introduced by the overlap can inflate the correlation coefficient, making the correlation appear stronger than it truly is. To correct for this:
  1. Adjust Degrees of Freedom: This involves estimating an effective sample size that is reduced to account for the overlap. The calculation of the effective sample size typically considers the actual sample size and the overlap proportion, as well as the autocorrelation coefficient at the lag of the overlap size.
  2. Use Non-Overlapping Data: You could choose to calculate the correlation using only non-overlapping samples to ensure independence between observations.
  3. Time Series Analysis Techniques: Employ methods like prewhitening or use time series models (like ARIMA) that account for autocorrelation within the data.
  4. Signal Processing Techniques: In some cases, using techniques such as wavelet coherence can be appropriate for dealing with overlapping time-frequency data.
It’s important to be transparent about the methods you use to correct for overlap when analyzing the data, especially if the results will be published or used for critical decision-making. Consulting a statistician can also be beneficial to ensure the chosen method is appropriate for the dataset and analysis goals.
---------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
Feel free to contact me.
  1 Comment
Jonas
Jonas on 11 Jan 2024
thx for your response. actually, at the moment I am sticking to "solution" 2, since it avoids the whole problem completly. however, of course choosing overlap the resolution is better. can you tell me how to adjust the degree of freedom for the corr MATLAB function? A more specific answer would totally help my out here.
your third point sounds definitly interesting, can you give me short code in matlab for that too? whats the advantage/disadvantage between 1 and 3?
thank you

Sign in to comment.

Categories

Find more on Conditional Mean Models in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!