How to generate random numbers correlated to a given dataset in matlab

I have a matrix x with 10,000 rows and 20 columns. I want to generate another new matrix of random numbers , y, where y is correlated to x with correlation coefficient q.
Note that the matrix x is not normally distributed - it has the power law distribution.

Answers (2)

Ah. Every once in a while, I see a question come up that is interesting. In this case, it should not be difficult to do. In fact, I can see at least one solution, and maybe a second way to do so. (I'll post an answer later today. Must run out now. Sorry, but at least you can know to expect an answer if nobody else gives you one.)
A question first though. Since the mean of a variable has no impact on the correlation, do you care what the mean of y will be? Or can it simply have mean 0?
Next, I assume you mean the traditional correlation coefficient, thus the Pearson version?
Hmm, as I think, a more interesting question is, given a set of n variables, with their own set of inter-correlations, is can we choose a new variable that has a given set of n specified correlations with each of those n variables? And I think the answer is yes, of course we can do so, as long as we have sufficient degrees of freedom.
Before I go for now though, here is a fun paper on the subject.
Later... (m-file solution attached to this answer.)
The basic idea is for a variable x, find a new vector y0, such that y0 is orthogonal to x. Then choose some linear combination of x and y0 that has the desired correlation.

7 Comments

The mean of y does not matter. Since X has a power law distribution, I suppose Y should also have the same distribution. I found the following solution in this site
But the problem is that it assumes that the data is normally distributed. For me this is not the case.
----------sample solution for normal distribution------
1: Generating two sequences of correlated random numbers Generating two sequences of random numbers with a given correlation is done in two simple steps:
Generate two sequences of uncorrelated normal distributed random numbers X1, X2 Define a new sequence Y1 = qX1 + \sqrt{1-q^2} X2 This new Y1 sequence will have a correlation of q with the X1 sequence.
-------------------------------------------------------
It is an interesting paper you link to. It's always interesting and useful to think a little more deeply about concepts we take for granted.
Here is another PDF version that doesn't have some title an equations cut off..
I decided to write the solution as a function. It has a lot of internal comments. The basic idea that I chose for the solution was to find a second vector y0, that has ZERO correlation with x. Then find some linear combination of x and y0 that has exactly the desired correlation. I've attached my m-file solution to this comment.
x = rand(100,1);
tic,y = randwithcorr(x,.5);toc
Elapsed time is 0.007392 seconds.
corr(x,y)
ans =
0.5
Note that randwithcorr has ABSOLUTELY NO requirements about the distribution of x. x may be a vector or an array of any shape. Ok, two requirements, but they are small and very logical ones.
1. x must have at least 3 elements. Otherwise, it makes no sense to talk about a correlation with some other vector.
2. x must not be a constant vector. Again, it makes no sense to talk about correlation then.
I am quite confidant that I could do some optimization in this code, but it is pretty fast as it is, and I am feeling lazy right now. Too hot today to actually think. For a vector of length 1e6, it still takes only 0.4 seconds to run.
x = rand(1000000,1);
tic,y = randwithcorr(x,-.75);toc
Elapsed time is 0.412036 seconds.
corr(x,y)
ans =
-0.75
I've attached the solution m-file to my answer above, as well as to this comment.
[SL: Edited formatting of numbered list so you don't have to scroll to see the contents of each item.]
Oh, I just saw your comment that y should also follow the same distribution as x. This would make the problem very difficult if x has some completely arbitrary distribution. For example, suppose you had not told me at all what the distribution of x was? Almost as bad, even for simple distributions, it is often quite difficult to generate correlated random variables for other than normal distributions, where you specify things like correlations and covariances. Really, those parameters make the most sense in context of a Gaussian random variate. I've honestly never really seen any good treatment for generating correlated variates for something like Weibull, or exponential or gamma random variables.
Dear John,
Could this function be extended to generate multi-variables with a fixed correlation coefficient between any pairs of these variables?
Thanks, Ruiyang
Thanks for the great answer, John.
I am working on the exact problem you mention in your answer.
Hmm, as I think, a more interesting question is, given a set of n variables, with their own set of inter-correlations, is can we choose a new variable that has a given set of n specified correlations with each of those n variables?
Can this be done? Please help! The correlation is the cosine of angle between two vectors (length is n). Can we construct a new vector that has specified correlations to existing vectors . All the vectors $x_1, x_2 \,\, \mathrm{etc. } $ are n-dimensional vectors and n>m
John, the file only works for rho=1 or -1? It is strange that a non-unity rho gives errors.

Sign in to comment.

This is a great answer @John D'Errico, but I have a further question. Suppose I have a vector 2 1 3 5 4 ... 10 of 10 numbers, from 1 to 10, ordered randomly, could be generated by
x=randperm(10).'
Now I want to generate another vector of 10 numbers, containing all numbers from 1 to 10 again, that has a correlation of at least p with my previous vector x. Any ideas? Your code works but of course produces real numbers between -1 and 1.

Categories

Find more on Random Number Generation in Help Center and File Exchange

Asked:

on 28 Jul 2015

Commented:

on 21 Apr 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!