Problem 44393. Testing for randomness: uniform distribution of integers
On Cody several problems have been set up asking players to generate one or more 'random' numbers. Usually they are asking for numbers from a uniform distribution function (UDF). However, Test Suites do not always test very carefully to see how 'random' the sequences generated by the submitted codes are. Indeed, rigorous testing of randomness is a sophisticated field of endeavour. (See e.g. TestU01 and the Diehard tests.)
MATLAB provides access to several very good pseudo-random number generators. Even these do not generate truly random number sequences — however, they are good enough for most purposes.
Here we will be considering only integer sequences. Your task will be to distinguish those that are 'practically' random — in this case being uniformly distributed — from those that aren't.
You must return:
- the probability of the given sequence being generated (a scalar float); and
- your assessment of whether the sequence could 'plausibly' have been produced by a true random number generator (a scalar Boolean).
Input sequences will be vectors of various lengths containing only the integers 1, 2, 3 and 4.
Here are a few cases to consider:
EXAMPLE ONE — Random sequence (UDF)
Input = [3 4 3 3 4 2 2 2 1 3 2 2 1 3 3 2 4 3 1 2 3 2 1 4 2 1 4 4 4 2 3 1 2 1 2 2 1] prob ~ 5.29E-23 isRandom = true
EXAMPLE TWO — Non-uniform distribution
Input = [1 3 3 1 1 1 1 3 1 1 4 1 2 1 1 1 1 1 3 1 3] % Notice that there are far too many ones, and too few twos and fours. prob ~ 9.09E-13 isRandom = false
EXAMPLE THREE — Blocked randomisation (or 'permuted block randomisation')
Input = [1 3 2 4, 1 4 2 3, 3 1 4 2, 2 1 3 4, 1 4 3 2, 3 2 4 1, 4 1 3] % Notice that each of the four digits appears in groups of four. In such cases the identity of neighbouring digits is often biased too, although it isn't apparent in the short sequence above. prob ~ 5.55E-17 isRandom = false
EXAMPLE FOUR — Repeated pattern
Input = [1 3 1 4 2 4 2 3, 1 3 1 4 2 4 2 3, 1 3 1 4 2 4 2 3, 1 3 1 4 2 4 2 3, 1 3 1 4 2] % Notice the pattern of eight digits that is repeated. Notice also that the identity of neighbouring digits is biased, as e.g. 4 is only ever followed by 2, and 3 is only ever followed by 1. prob ~ 5.29E-23 isRandom = false
EXAMPLE FIVE — Partial repeated sequence, too few runs
Input = [1 2 3 4 2 3 4 1 3 4 1 2 4 1 2 3 1 2 3 1 2 4 1 3 4 2 3 4 1 2 3 4] % Notice that the sequence "1 2 3 4" is repeated in part or in full. Notice also that the neighbouring digits are biased — e.g. 2 most commonly follows 1, but 2 is never followed by 1 or 2. And finally notice that there are no runs (repeated occurrences of the same numeral). prob ~ 5.42E-20 isRandom = false
EXAMPLE SIX — Partially segregated sequence, runs too long
Input = [1 1 4 1 1 1 1 1 1 3 3 3 3 1 3 3 3 3 4 4 4 4 2 4 4 4 4 2 2 2 2 2 2 3 2 2] % Notice that the sequence unfolds as mostly 1, then mostly 3, then mostly 4, and mostly 2 at the end. Notice also that the neighbouring digits are biased — each number most commonly follows itself. And finally notice that the runs are longer than we would expect from a truly random (UDF) sequence. prob ~ 2.12E-22 isRandom = false
As will be apparent, there are various tests that could be applied: inspecting occurrence of numerals, pairs of neighbouring numerals, runs of numerals (length of runs, number of runs), repeated sequences, etc. Your job is just to return the correct outputs. You are not necessarily required to implement every test.
Note: the sequences in the examples above were for illustrative purposes; most sequences in the Test Suite are considerably longer.
See also:
Solution Stats
Solution Comments
Show commentsProblem Recent Solvers3
Suggested Problems
-
42000 Solvers
-
1901 Solvers
-
350 Solvers
-
Golomb's self-describing sequence (based on Euler 341)
159 Solvers
-
Kryptos - CIA Cypher Sculpture: Vignere Decryption
51 Solvers
More from this Author32
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!