Random numbers with Zero mean (not the basics)

Hello, I have been searching for an alternative for my problem, but wasn't really able to get a convenient solution.
Issue with randn:
Although randn is based on zero mean, it doesn't really produce an array with zero mean. Even if I generate 1 million random variables from the standard normal, the mean sometimes is "far" from zero (ex: 0.003). I really need zero mean generations. I can't simply deduct the mean, as I need to use the numbers generated in simulating new stock prices, this moves me to my specific issue.
Issue with my case:
To simplify it to the maximum, please consider the following algorithm:
for i=1:j
Step 1 StockPrice %Given
Step 2 %Some calculations
Step 3 StockPrice = StockPrice*randn %Generating new stock price
Step 4 %Some calculations
Step 5 StockPrice = StockPrice*randn %Generating new stock price
end
I really don't want to complicate my code with if statements to achieve the zero mean. Kindly note that generating one vector of random numbers will be much faster than my current method.
My problem will be solved if there is a way to move in a vector in every iteration.
Thanks in advance

6 Comments

Would you be allowed to generate n/2 random positive numbers, then use the other half (n/2) of your set as their negatives? That would create zero mean, but may not be what you want.
Thanks A M, That is right, I don't really have a problem with generating the numbers, but with a method that allows me to use them efficiently in the loop.
Thanks for your contribution anyways.
Thanks for your question. I did not know about this strange behavior of randn function. Thanks for enlightening me. I think that A M is reporting a very good way of solving your problem which is both fast and efficient.
YourVector(Current_Start : Current_Start + Number_To_Use_Now - 1)
....
Current_Start = Current_Start + Number_To_Use_Now
would have the effect of moving in the vector.
If you generate one random number at a time, and your random values are required to have a mean of 0 and not just statistically, then your one random number would be forced to be 0. Likewise, if you generate 2 at a time, the two would be forced to be negatives of each other. This does not sound realistic.
I would suggest that stock price movements are not uniformly random distributed, that there is too much of a tendency for No Change for a significant number of stocks (they might not be market movers but the stock exchange includes them anyhow.)
Many thanks #Walter for your suggestion. I will give it a shot tomorrow, and let you know how it goes. Kindly note that I have added details about my issue in the comment I made in response to #Peter's answer.
I wonder if your method would be still a valid solution to my case.
Thanks & Best- AND

Sign in to comment.

 Accepted Answer

AND, no offense intended, but this doesn't make a lot of sense to me.
I suspect you know the following: randn draws from a standard normal distribution, which has zero mean. Any finite set of independent values drawn from that distribution will, with probability 1, have a non-zero sample mean. But that's their sample mean, not the mean of the distribution that they are drawn from. This is exactly what you should expect when drawing independent values from a normal distribution. It is not in any sense a "strange behavior".
You probably also know most of the following: If you want a finite set of "normal-like" values that has zero mean, then the simplest thing is to generate however many values you need, and subtract their mean. Another possibility is what's called antithetic sampling, where for each value that you draw from randn, you also use its negative. You can get that using the Antithetic property of the gloabl RandStream object. Both of these create what are most definitely not independent draws from a normal distribution. They are constrained dependent draws.
You say you cannot do that, and that you need "a method that allows me to use them efficiently in the loop". I'm going to have to guess that the reason is one of two things:
1) You are generating your values one at a time. I won't get into the efficiency issues that that raises, but unless you're generating hundreds of millions of values, you should be able to call randn once to generate a vector of however many values you need, subtract off its mean, and draw from that vector in your loop. Equivalently, you could read and save the generator state, generate a large vector and save its mean, then reset the generator state and draw numbers one at a time, subtracting off the mean that you previously saved.
2) You don't know in advance how many loop iterations you will have. If that's the case, then I think you are asking for the impossible. The only way that you can expect to draw random values and maintain a sample mean of zero at all steps is to draw only zeros.
You have not said why you care about a zero mean. Perhaps that would make things more clear.

6 Comments

Another (probably not satisfactory) possibility:
If for any one random value, the mean is allowed to be non-zero, then it must be the case that there is a particular condition at which time the mean of the samples (perhaps long term, perhaps "recent") is 0. Generate normally but keep a running total over that timeframe. For the draw immediately before the condition will need to hold, instead of drawing randomly, put in the negative of the running total; the mean over the relevant stretch will then be 0, at least to within round-off error.
A difficulty with this is that for that one location, the expected value will not be normally distributed (even if some of the previous values were unknown); the central limit theorem would expect it to being much closer to 0.
Many thanks #Peter for your detailed answer.
You are right, but I wasn't the one who said it is a strange behavior. Actually, I am concerned with the value, only when it is far from zero (i.e 0.003). I mean in normal cases, one would expect the mean to get closer to zero as the sample size becomes bigger. In most cases, I am getting this, but there is no consistency. Since it is a random process, this could happen.
The reason for requiring this, is that I am drawing a certain path, where continuity is needed. Since I am getting such values for the mean, I would expect even bigger/smaller values that would cause a real problem, because I am currently generating each random variable at once.
You are right, I don't know exactly how many iterations I would have in each loop, but I can determine a maximum. Efficiently speaking, it is much better to generate the whole vector at once, and use the numbers generated when needed. Suppose the process generates 3 prices, and the maximum possible number of full iterations is 5; then I can generate a vector of 15 random numbers, or 3 vectors of 5 random numbers, so that each price generating equation has its own vector. Please note that not all these numbers would be necessarily used.
I will give Walter's method a shot tomorrow, to see if it works.
Particularly, my goal now is to find a way to move across an array. Simply speaking, suppose I generate only one stock price in a while loop. It might iterate 5 times. Suppose I generate a vector of random variables {1 2 3 4 5}. What I need is to use the first generated random variable "1" in the first calculation. In the next, use "2", and so on.
I wonder if this is possible in the first place.
Again, many thanks #Peter - appreciate it. AND
Hi again #Peter
Just to add, If there is a method that would accomplish what you stated in (1), my problem would be solved, because what I need is a method that allows me to draw the next value in a vector (after the one lastly used).
Thanks again-
AND
Better would be to rewrite this using nested functions, but the below gives the idea:
function populate_randn(N)
global randvec randidx
randvec = randn(N,1);
randidx = 0;
end
function r = draw_randn
global randvec randidx
if randidx >= length(randvec)
r = NaN;
fprintf(2, 'Exhausted random pool\n');
else
randidx = randidx + 1;
r = randpool(randidx);
end
end
Then each place you would draw one random number, use draw_randn() instead of randn()
Clearly this could be modified to deliver a number of values at a time.
AND, I remain puzzled. This code
ntrials = 100000;
N = [10 100 1000 10000 100000 1000000 10000000];
for n = N
xbar = zeros(ntrials,1);
for i = 1:ntrials
xbar(i) = mean(randn(n,1));
end
z05 = 1.96/sqrt(n);
sprintf('n = %d, z05 = %f: %f',n,z05,sum(abs(xbar) > z05)/ntrials)
end
results in this output:
ans =
n = 10, z05 = 0.619806: 0.050480
ans =
n = 100, z05 = 0.196000: 0.050930
ans =
n = 1000, z05 = 0.061981: 0.049540
ans =
n = 10000, z05 = 0.019600: 0.051760
ans =
n = 100000, z05 = 0.006198: 0.049380
ans =
n = 1000000, z05 = 0.001960: 0.050130
ans =
n = 10000000, z05 = 0.000620: 0.050250
which demonstrates that randn creates vectors with "large" sample means (outside the 95% probability limits) the correct proportion of times. A sample mean of .003 is not terribly unusual unless you have a very large vector of random values. Perhaps you do. But still, the above demonstrates that the sample mean of vectors from randn converge in probability to 0 at just the rate you'd expect from independent draws from a standard normal.
Perhaps you have some other reason for wanting your normals to have zero mean. Unless you can figure out in advance the total number you need, I think you are done for.
Dear Walter & Peter,
Many thanks for your contribution. I really appreciate it. I apologize for not being able to reply previously.
Walter, thanks - but wouldn't calling this function be ~equally time consuming like randn?
In order to mark this question as solved, could any of you please indicate in a separate answer, the answer to the following question? (yes or no), as I am not really sure about how Matlab handles it.
"Would drawing 10mio separate randn be equivalent to drawing a vector of 10mio randn in terms of randomness?" i.e would the mean of each sample be randomly close to the other? As in the case of drawing two vectors of 10mio randn.
I believe that the answer is yes, but I just want to make sure.
Many thanks again,
AND

Sign in to comment.

More Answers (3)

AND, the answer to your latest question, if I understand it correctly) is yes, it doesn't matter if you generate one value at a time, or one million all at once, or one thousand values a thousand times. You will get the same sequence of values:
>> rng default
>> randn, randn, randn, randn
ans =
0.53767
ans =
1.8339
ans =
-2.2588
ans =
0.86217
>> rng default
>> randn(2,1), randn(2,1)
ans =
0.53767
1.8339
ans =
-2.2588
0.86217
>> rng default
>> randn(4,1)
ans =
0.53767
1.8339
-2.2588
0.86217

1 Comment

Many thanks Peter ,
Perfect, this what we refer to as a written proof. I just noticed that your first answer was chosen (by votes I think), but it is yours, that is the good thing.
Many thanks again Peter and all best

Sign in to comment.

n=100; %number of random numbers you need
x=randn(n/2,1); %half random numbers
y=[x; -x]; %array of 'random' numbers with mean 0
mean(y)
the cyclist
the cyclist on 1 Sep 2013
Edited: the cyclist on 1 Sep 2013
I believe this File Exchange submission will do close to what you need:
However, these numbers are uniformly distributed, not normally.
I'm not sure how hard it is to modify it.

1 Comment

Many thanks #thecyclist, There are some alternatives to have the sample mean 0. But if I use such a method, I will need afterwards to move in the array to the next number after the last one used. That becomes the part I am worried about. Thanks anyways for the suggestion.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!