Correlation computation using a window of 3

Hello
Please I have 3 column x, and y
x = 5, 8, 9, 4, 9, 6, 0 ,7 ,8 , 5, 4
y = 6, 4, 8, 7, 3, 7, 8 ,7 ,6 , 4, 7
I want to find the correlation using 3 window size computation
for instance the first 3 windows will be corr for x = 5, 8 , 9 and y = 6, 4, 8
The if the last numbers is not equal to 3 then the correlation of the numbers present is obtained in the case of
The cor for x = 5, 4 and y = 4, 7 is obtained
I get a new column for x and y with 4 rows
I need a value for the correlation instead of the corrcoef function which is giving me matrices.
Thanks for your help in advance.

 Accepted Answer

Dana
Dana on 17 Aug 2020
I don't entirely understand what you're trying to do, but you may want to use corr(a,b) instead of corrcoef(a,b) or corrcoef(C).
For a matrix C, corrcoef(C) returns a correlation matrix, i.e., a matrix whose (i,j) element is the correlation coefficient between the i-th and j-th columns of C. For column vectors a and b, the syntax corrcoef(a,b) is the same thing as corrcoef([a,b]) (i.e., MATLAB just puts the two vectors together into a single matrix, and then finds the correlation matrix).
On the other hand, corr(a,b) simply returns the correlation coefficient between the vectors a and b. Note, however, that corr([a,b]) = corrcoef(a,b) = corrcoef([a,b]), i.e., that syntax will also return the correlation matrix. So if you just want the one correlation coefficient, you need to use corr(a,b).

5 Comments

Thanks Dana for your valuable comment. what if Intend to find the corr over a window as explained in my question what do I do?
Your explanation of this "window" thing is not very clear, so I can't really help much with that. You can compute correlations of sub-vectors by indexing, e.g., corr(x(1:3),y(1:3)). You can also use variables as indices, e.g., firsti=1, lasti=3, then corr(x(firsti:lasti),y(firsti,lasti)). Is that what you're after?
Tino
Tino on 17 Aug 2020
Edited: Tino on 17 Aug 2020
Hi Dana,
The window is actually a fraction of the variables for (x and y) for instance I want to get the corr value of the first 3 values between (x and y) then the next three value (x and y) and the next three value ( x and y), Until when the end of the data. When there is less than three value remaining then the corr is computed for the remaining values between x and y. The new values forms a new columns of variables.
I hope this is clear
Thanks
I see now what you're trying to do. There are any number of different approaches you could take. Here's one:
x = [5, 8, 9, 4, 9, 6, 0 ,7 ,8 , 5, 4];
y = [6, 4, 8, 7, 3, 7, 8 ,7 ,6 , 4, 7];
winsz = 3; % window size
xy = [x;y]; % combine data
nxy = size(xy,2); % number of observations
ngr = ceil(nxy/winsz); % number of groups of size winsz
pdsz = ngr*winsz; % we will pad the data with extra elements so that the
% total # of elements is evenly divisible by winsz;
% pdsz is the size of the padded array
xy(:,nxy+1:pdsz) = NaN; % pad to desired size with NaN
xy = reshape(xy,2,winsz,ngr); % reshape into a 3-D array, where 1st and 2nd row correspond
% to x and y, columns to winsz observations, and the 3rd
% dimension to different groupings of size winsz
% dv is a 1x1xngr array whose j-th element will be the number of observations in the j-th
% group; this will be equal to winsz in all but the last group
dv = winsz*ones(1,1,ngr);
dv(ngr) = winsz-(pdsz-nxy);
xymeans = sum(xy,2,'omitnan')./dv; % compute means of x and y for each group
xyc = xy - xymeans; % de-mean the observations
xystds = sqrt(sum(xyc.^2,2,'omitnan')./dv); % compute s.d.'s of x and y for each group
xycovs = sum(prod(xyc,1,'omitnan'),2)./dv); % compute covariances of x and y for each group
xycorr = reshape(xycovs./prod(xystds,1),1,ngr); % get correlation coefficients, and then
% reshape 3-D result to a row vector
EDIT to say: the above uses the sample mean from each group of 3 as the mean estimate for that group. This is what would be done if you just ran a loop and called the corr function for each grouping of 3. You could substitute some other mean estimate if you wanted, though, e.g., use the same mean from the entire vectors x and y for each group. To do that, you'd instead use xyc = xy-mean([x;y],2).
Also, in hindsight, it's probably an easier option to just run a loop here. That would be noticeably slower for large arrays, but in this case it won't make an appreciable difference. So:
x = [5, 8, 9, 4, 9, 6, 0 ,7 ,8 , 5, 4].';
y = [6, 4, 8, 7, 3, 7, 8 ,7 ,6 , 4, 7].';
winsz = 3; % window size
nxy = numel(x); % number of observations
ngr = ceil(nxy/winsz); % number of groups of size winsz
xycorr = zeros(ngr,1);
for j = 1:ngr
indsj = ((j-1)*winsz+1:min(j*winsz,nxy)).';
xycorr(j) = corr(x(indsj),y(indsj));
end
As a last note, this loop method delivers the same answer as the other method above, except for in the last group. That last group only has two observations, and in that scenario you need to be more careful in calculating the correlation coefficient. In particular, +/- 1 are the only possible correlations when you have only two observations, and the method I did above won't give you that answer.
Furthermore, if you were to apply either of these methods in a situation where the last group has only 1 observation, it's not going to work at all.
Thank you very much am really grateful

Sign in to comment.

More Answers (0)

Asked:

on 17 Aug 2020

Commented:

on 18 Aug 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!