Please e-mail any questions, comments, or bugs to
daniel.cer@cs.colorado.edu

**Description:** This entry illustrates how to generate integer & real valued pseudo-random numbers in various different programming languages using built-in/standard library routines. It also contains a survey of the underlying algorithms used to generate random numbers in popular implementations of these languages.

### C/C++

*GNU C Library*
### Java

### Perl

### Python

### Ruby

### Octave

### Scheme

*Implementation in Guile*
### Bash Shell Script

### Other Sources of Entropy

### Implementation Notes

### Footnotes:

**Languages:** C, C++, Java, Perl, Python, Ruby, Octave(/Matlab like), Scheme, and Bash shell script.

**Source files:**

LibraryRandomNumber.c [raw code], LibraryRandomNumber.cc [raw code], LibraryRandomNumber.java [raw code], LibraryRandomNumber.pl [raw code], LibraryRandomNumber.py [raw code], LibraryRandomNumber.rb [raw code], LibraryRandomNumber.m [raw code], LibraryRandomNumber.scm [raw code], LibraryRandomNumber.sh [raw code],

**Jump to:**

Languages:
C/C++,
Java,
Perl,
Python,
Ruby,
Octave,
Scheme,
Bash

Misc:
Other Sources of Entropy,
Implementation Notes,
Sample Code

In C/C++, pseudo-random integers distributed uniformly (/semi-uniformly) along the range 0 to some value, *RAND_MAX*, can be generated by calling the *rand()* function. While on some platforms there are functions that return better sequences of pseudo-random values, one advantage of rand() is that it's specified in the ANSI/ISO C standard. Thus, code that uses it will be more portable than code that uses alternatives such as *random()* or members of the rand48 family such as *lrand48()* or *drand48()*.

Even though ANSI C requires the existence of a function rand() that generates sequences of pseudo-random integers, it leaves the particular algorithm to be used unspecified. However, it does give the following portable sample implementation using a linear congruential generator:

```
static unsigned long int next = 1;
int rand(void) // RAND_MAX assumed to be 32767
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}
```

Programming languages - C, ANSI/ISO/IEC 9899-1999, pg 311
Note that the above code sample conforms to the requirement that RAND_MAX is at least 32767, and that the default random seed is 1.

In the GNU C Library, both *rand()* and *random()* share the same underlying implementation. This can be seen on line 28 in glibc-2.3.6/stdlib/rand.c (or, alternatively "man 3 rand" on a Linux box should work, too). By default, the implementation uses a *linear feedback shift register* with 128 bytes of state information to generate random integers ranging from 0 to a RAND_MAX of 2^{31}-1. The amount of state information used can be modified by calling *initstate()*, and is allowed to range from 8 to 256 bytes. However, if fewer than 32 bytes are used, the code back-offs to using a linear congruential generator.

The generator operates by having two integer pointers into an array of state information. Each time a random number is requested, the corresponding values in the state array are summed. The result of the summation is then stored in the state array, replacing the value pointed to by one of the pointers. This new value is also subsequently used as the random value generated by the algorithm. However, since the new value is a 32-bit integer and only 31-bits are needed in order to get a random number in the desired range, the least significant bit is dropped in order to get the final random value returned by the routine. After generating and storing the new random value, both of the state pointers are incremented so that they point to subsequent locations in the state array. If either pointer reaches the end of the array, it is reset to point to the beginning of it.

Below is the snippet of the code that gets executed to generate the successive random values:

```
val = *fptr += *rptr;
/* Chucking least random bit. */
*result = (val >> 1) & 0x7fffffff;
++fptr;
if (fptr >= end_ptr)
{
fptr = state;
++rptr;
}
else
{
++rptr;
if (rptr >= end_ptr)
rptr = state;
}
```

From the function __random_r() in glibc-2.3.5/stdlib/random_r.c, lines 391-405
Above, *fptr* and *rptr* are pointers to 32-bit integers inside the *state* array. The pointer *result* is used to store the random value that will be returned by function. The initial state array is populated with values generated using a linear congruential random number generator. The pointer *rptr* is initialized to point to the beginning of the state array, while the initial location pointed to by *fptr* varies according to the size of the state array (for the default state array size of 128 bytes, &state[3] is used).

Recall that the period of a random number generator is the number of random values that it generates before the sequence repeats itself^{1}. For this generator, the size of the period will depend on the amount of state information used. According to the Linux man page for *random()*, when using the default of 128 bytes of state information, the period is approximately 16*(2^{31}-1).

In Java, the class Random from the package java.util can be used to generated pseudo-random values for a variety of data types, specifically, boolean, byte, int, long, float, and double. For doubles, random values can be generated along both a uniform distribution in the range [0,1)^{2}, and a Gaussian distribution with mean 0.0 and standard deviation 1.0. For integers, random values are uniformly distributed across all the values that can be represented using a 32-bit signed integer, [-2^{31},2^{31}-1].

Internally, all random values generated by objects of this class are produced by calling the method *next(int bits)* that returns an integer with the requested number of random bits. This value is then used to generate the various type specific random values. The implementation is convenient as it allows subclasses of Random to simply replace this method with one that implements an alternative pseudo-random number generator and then all of the other type specific methods will generated random values using the new algorithm.

To generate random values, *next()* uses the following linear congruential pseudo-random number generator.

```
synchronized protected int next(int bits) {
seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
return (int)(seed >>> (48 - bits));
}
```

Java 5, J2SE API documentation - Random.next()
In an alternate notion, the recurrence for this generator is then:

r_{i+1} = 25,214,903,917*r_{i} + 11 modulus 2^{48}

Note that, even though seed is stored as a long, Java's signed 64-bit integer type, only 48 bits of state information are stored in it. Further, the constant multiplier, 0x5DEECE66D, and increment, 0xB, are identical to those used for the popular rand48 family of linear congruential random number generators found on many Unix/Unix-like systems. Additionally, a library routine from this family of generators, *drand48()*, is also Perl's preferred generator on platforms that support it.

The given recurrence has a period length of 2^{48} (this is order 10^{14}). As such, since the modulus is also 2^{48}, the underlying generator can be said to have a *full period*.

In Perl, real valued uniformly distributed pseudo-random numbers can be obtained by calling the function *rand()*. This function takes one argument that specifies the upper limit on the range of random values. That is, *rand(n)* will return a real valued random number in the range [0,n). This value can then be passed to *int()* if random integer values are desired. Rather than rounding to the nearest integer, *int()* just removes the decimal portion of it's argument. This truncation is actually desirable in this case, since it results in random integer values that are still uniformly distributed in the range [0,n). If rounding was performed, '0' would be less probably than all of the other integers in the original range, and the range would change to [0,n] (i.e. the ranges now includes the end point n).

Perl generates random numbers using the library routines provided by the platform it has been complied for. Specifically, Perl's first choice is to use *drand48()*. However, if it is not available, it will try to use *random()*. If both of these are unavailable, it will settle for *rand()*. Perl's configuration script has somewhat amusing commentary on its preferences regarding the three routines:

```
if set drand48 val -f; eval $csym; $val; then
dflt="drand48"
echo "Good, found drand48()." >&4
elif set random val -f; eval $csym; $val; then
dflt="random"
echo "OK, found random()." >&4
else
dflt="rand"
echo "Yick, looks like I have to use rand()." >&4
fi
```

From perl-5.8.7/Configure, lines 17555-17564
In python, random numbers can be generated using the module *random*. This module not only provides routines for generating uniformly distributed integer and real valued pseudo-random numbers, but also provides routines for real valued numbers distributed along a sizable number of different distributions (e.g. beta, exponential, gamma, Gaussian, Pareto, and Weibull distributions). Additionally, there are routines for shuffling sequences and sampling from a collection.

The module is implemented using a hidden object instantiated from the class *random.Random*. The functions in the module simply invoke the corresponding methods in this object. The Random class itself is organization in a manner similar to Java's Random class where by, if you recall, all the methods for generating random values depend on the method *next()*. In Python's case the core method's name is *random()*, and it generates real valued numbers in the range [0,1). As with Java, to replace the pseudo-random number generator used by all the methods in the class, one can simply subclass Random and supply a new implementation of the core generator, *random()*.

Starting with CPython 2.3, the method *random()* uses a Mersenne Twister pseudo-random number generator. This is a fairly new technique that has been shown to result in generators with impressive periods and otherwise good sequences of pseudo-random numbers. More concretely, for the set of parameters used in the variant/implementation known as MT199937, the generator has an enormous period of 2^{19937}-1 (this is on the order of 10^{6001}). The approach and the MT19937 variant were introduced in the following paper by Makoko Matsumoto and Takuji Nishimura.

Matsumoto, M. and Nishimura, T. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modeling and Computer Simulation, 8 (1998) pg. 3-30. [PDF]

Matsumoto and Nishimura also made a C-code implementation of MT19937 available under a fairly liberal open source license. With a few mirror changes in order to integrate the code into CPython, it is the 2002/1/26 release of this MT19937 implementation that acts as the underlying generator for the *random()* method (see Python-2.4.2/Modules/_randommodule.c). The original unmodified code is available for download from the official Mersenne Twister Home Page

Pseudo-random numbers can be generated in Ruby by the Kernel method *rand()*. This routine takes one argument dubbed max. If the magnitude of max is greater than or equal to 1.0, then *rand()* returns an integer uniformly distributed in the range [0,max.to_i.abs]. In this context, like Perl's *int()* routine, the method *to_i()* returns just the integer portion of a real valued number without any rounding and the method *abs()* simply takes the absolute value of the resulting integer. However, if the magnitude of max is less than 1.0 but not 0.0, *rand()* returns a real valued number uniformly distributed in the range [0,max.abs). If max is zero, the default value if no parameter is explicitly supplied, *rand()* returns a real valued number in the range [0,1).

Like python, Ruby uses a Mersenne Twister pseudo-random number generator. In fact, it also uses the MT19937 C-code released by Matsumoto and Nishimura (see ruby-1.8.3/random.c). However, Ruby uses a variant released on 2002/2/10 that is potentially faster on some platforms, but it has been reported to be slower on Pentium 4 systems. On the Mersenne Twister homepage this release is listed as a "version considering Shawn Cokus's code", and dated 2002/2/11.

Pseudo-random real valued numbers can generated in Octave using the functions *rand()* and *randn()*. Additionally, a row vector with a random permutation of integers can be generated using *randperm()*. If the routines *rand()* and *randn()* are called without any arguments, they both produce a single random value. However, if they are called with a single numeric argument, n, they will generate a square n-by-n matrix of random values. If they are called with two arguments, say n and m, they will generate a n-by-m matrix of random values. The two routines differ in that *randn()* generates values that have a Gaussian distributed with mean 0 & standard deviation 1, while *rand()* generates values uniformly distributed in the range (0,1) (see: octave-2.1.72/libcruft/ranlib/genunf.f). Note that the latter differs from the other similar routines previously presented that generate values in the range [0,1)^{2}.

The values returned by *rand()* and *randn()* are generated using routines from Ranlib, a library of C and Fortran routines for generating pseudo-random numbers. The package is maintained by Barry Brown of the Department of Biostatistics & Applied Mathematics at the University of Texas, MD Anderson Cancer Center, and can be downloaded from the department's software repository.

The package generates random numbers by combining the output of two multiplicative linear congruential generators. Doing so results in an aggregate generator that has a period of 2.3 x 10 ^{18}. The implementation of this generator is based on a Fortran 77 port of the Pascal source code that appears in the appendix of the following paper:

L'Ecuyer, P. and Cote, S. "Implementing a Random Number Package with
Splitting Facilities." ACM Transactions on Mathematical Software,
17:98-111 (1991) [PDF]

Note that, this paper focuses on providing simultaneous access to multiple generators each of which that has its output organized into a series of substreams/blocks. Generators can be maneuvered among and within the blocks through instructions to transition to the next block, restarted within the current block, and restart at it's first block. While this functionality is provided by Ranlib, it does not appear to be used by Octave.

Pseudo-random numbers can be generated in Scheme using the procedure *random*. This procedure takes one argument, named modulus, and returns values uniformly distributed in the range [0,modulus). If the specified modulus is of type integer, the routine will return integer random values. Similarly, if a real valued modulus is used, the routine will return a real valued random number.

In Guile, an implementation of Scheme targeted as being an "extension language" (/embedded scripting language) for other applications, random numbers are generated using a Multiply with Carry generator. This class of generators was first proposed in a Usenet post by George Marsaglia in sci.stat.math titled "Yet another RNG" on August 1, 1994. At the time, Marsaglia was so excited about this class of generators that he informally dubbed them "The Mother of All Random Number Generators" due to the long periods that could be generated using a relatively simple and efficient algorithm.

Multiple-with-Carry (MWC) generators produce values using the following recurrence:

r_{n+1} = a * r_{n} + carry_{n} mod M

Here carry can be defined as:

carry_{n+1} = int((a * r_{n} + carry_{n}) / M)

However, since MWC generators use a modulus, M, that is a power of two, *r* and *carry* are really just the low and high order bits, respectively, of the same underlying recurrence.

Within Guile, it's MWC generator is implemented by the following code snippet. A variant of this generator is also provided for systems that don't support 64-bit integer types. While slightly more complex, the alternate generator still implements the same underlying computation.

```
unsigned long
scm_i_uniform32 (scm_t_i_rstate *state)
{
LONG64 x = (LONG64) A * state->w + state->c;
LONG32 w = x & 0xffffffffUL;
state->w = w;
state->c = x >> 32L;
return w;
}
```

From guile-1.6.7/libguile/random.c, lines 114-122
In the code above, the basic recurrence is computed and then stored in a 64-bit temporary variable *x*. The high order bits are then stored in the carry, *state->c*, and the low order bits are stored in *state->w* and then returned as the random value produced by the generator.

The generator has a period of approximately 4.6 x 10^{18}, or alternatively about 2^{62};

Pseudo-random integer values can be obtained in Bash by retrieving values from the special shell variable *RANDOM*. Each time a value is read from this variable, Bash generates a random integer (semi-)uniformly distributed in the range [0,32767]. If a value is assigned to *RANDOM*, the value is used as a new random seed for the generator. The variable *RANDOM* can also be used to generate integer values in Z shell (zsh) and Korn Shell (ksh). However, it isn't support by the original Bourne shell (sh).

Retrieving values from *RANDOM* in bash invokes the following linear congruential pseudo-random number generator. Here, *rseed* is an unsigned long that is initialized to 1, but then later set to the sum of the current process id and the shell start time.

```
/* Returns a pseudo-random number between 0 and 32767. */
static int
brand ()
{
rseed = rseed * 1103515245 + 12345;
return ((unsigned int)((rseed >> 16) & 32767)); /* was % 32768 */
}
```

From bash-3.1/variables.c lines 1146-1152
If you recall, this is the exact same generator given in the ANSI/ISO C Standard as the example implementation of the *rand()* function. The code snippets only differ in the manner that they remove the 16 low order bits from the state variable (rseed/next) and then ensure that the resulting integer is in the range [0,32767] across all platforms. Specifically, in Bash, this is done using universally fast bit manipulation operators, while, in the Standard C example code, this is done using division and modulus operations; i.e. ((rseed >> 16) & 32767) vs. ((next/65536) % 32768). However, it is interesting to note that the GNU C compiler and presumably other modern C compilers actually compile both variants of these two operations to the same pair of processor instructions:

```
sar eax, 16 ; shift-16/divide-65536 operation
and eax, 32767 ; and-32767/modulus-32768
```

Generated using gcc version 3.4.4, options "-masm=intel -S"
In some domains, such as cryptography, the pseudo-random number generators discussed above may not be entirely appropriate due to their deterministic nature. That is, given a known initial state, such generators will always produce the exact same sequence of random values. On Unix, Linux and other Unix-like platforms an alternative source of entropy is the special file /dev/random. Values read from this file represent random numbers generated by the operating system using various system events to create a pool of entropy. Since such systems events can be based on random events in the external environment, the generated values are much more like true random numbers. Additionally, on Microsoft Windows the routine *CryptGenRandom* can be used for generating random values targeted for use in cryptography. In contrast to /dev/random, this routine doesn't appear to automatically use random system events to generate random values. However, it does allow the calling routine to supply a buffer containing an "auxiliary random seed" that can be filled with random environmental noise that has been collected by the calling application.

The sample code given below for generating random numbers takes two parameters, one specifying the type of random value to be generated ("int" or "real") and the other indicating how many random values should be produced. The random seed is left to whatever the default is for the routines used in the respective languages. This means that for the C, C++, and (Guile interpreted) Scheme implementations the sequence of random values will be identical across separate invocations, while for the other implementations the sequence will differ.

- see the description of period included with the Randu post from last week.
- [0,1) Indicates that the range is includes '0' but excludes '1'. In contrast, (0,1) indicates that the range excludes both '0' and '1'

LibraryRandomNumber.c [raw code], LibraryRandomNumber.cc [raw code], LibraryRandomNumber.java [raw code], LibraryRandomNumber.pl [raw code], LibraryRandomNumber.py [raw code], LibraryRandomNumber.rb [raw code], LibraryRandomNumber.m [raw code], LibraryRandomNumber.scm [raw code], LibraryRandomNumber.sh [raw code],