Is mt_rand() really random?
To obtain a pseudo-random number using PHP's random-number generator, call the mt_rand()
function and it will return a value between 0 and RAND_MAX in which
RAND_MAX is a system-defined upper limit (which you can inspect by
calling the mt_getrandmax()
function).
The mt_rand()
function uses the Mersenne Twister algorithm and is four times faster and better characterized than PHP's older rand()
function.
Before you use PHP's mt_rand()
in your probability models, you might want to convince yourself that the mt_rand()
function works correctly. How could you do this?
Most developers are content to write a script, get it to generate a few random values, and then accept that it is working correctly if they don't notice any obvious biases in the numbers that are appearing. This eyeball analysis might convince you, but it won't, as they say, convince the lawyers.
One approach to find more convincing evidence is to precisely define what it means for a sequence of numbers to be random. A random sequence of numbers should have many properties, but one of the most important properties is that each number in the range of possible values should have an equal likelihood of appearing at each point in the sequence.
A way to measure whether this is true is by counting the number of times each value occurs and graphing the frequency counts for each value. The resulting graph should approximate a uniform distribution of counts for each value in your range. If you limit the range of allowable sequence numbers from 0 to 9 and generate a sequence of 1,000 numbers, then the graph should approximate the discrete uniform distribution depicted in Figure 5.
Figure 5. Uniform distribution for truly random numbers
To test whether PHP's mt_rand()
function generates a
uniform distribution of random values, I've created a script that uses
the Chi Square test to determine this. The first half of the script is
primarily concerned with creating a frequency distribution from output
of mt_rand()
. The second half performs the ChiSquare test.
The test involves setting the alpha cutoff to use for computing a
critical Chi Square value. If the obtained Chi Square value exceeds the
critical Chi Square value, then you would reject the null hypothesis
that the mt_rand()
values come from a uniform distribution. In fact, you would not reject the null hypothesis if mt_rand()
is working as it should.
|
The following table shows a sample output from this script. As the obtained Chi Square value of 7.90 is less than the critical value of 16.92, you cannot reject the null hypothesis that your observed frequencies are different than the frequencies expected under the assumption that you are sampling from a uniform distribution.
Table 1. Output from PHP Chi Square script
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Totals | |
Observed | 91 | 115 | 90 | 104 | 101 | 95 | 105 | 113 | 88 | 98 | 1000 |
Expected | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 1000 |
Variance | 0.81 | 2.25 | 1.00 | 0.16 | 0.01 | 0.25 | 0.25 | 1.69 | 1.44 | 0.04 | 7.90 |
Statistic | DF | Obtained | Prob | Critical |
Chi Square | 9 | 7.90 | 0.54 | 16.92 |
It can be instructive to run this script a number of times and observe that on some occasions you reject the null hypothesis. Why do you think this occurs? How often can this occur before you need to reject the null hypothesis? And is there a tool to help make these determinations?
View Apply probability models to Web data using PHP Discussion
Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Designing a PDL