Dan's coding practice area, simple programs in various different programming languages

Please e-mail any questions, comments, or bugs to daniel.cer@cs.colorado.edu

Description: This entry illustrates how to generate integer & real valued pseudo-random numbers in various different programming languages using built-in/standard library routines. It also contains a survey of the underlying algorithms used to generate random numbers in popular implementations of these languages.

Jump to:
Languages: C/C++, Java, Perl, Python, Ruby, Octave, Scheme, Bash
Misc: Other Sources of Entropy, Implementation Notes, Sample Code

C/C++

In C/C++, pseudo-random integers distributed uniformly (/semi-uniformly) along the range 0 to some value, RAND_MAX, can be generated by calling the rand() function. While on some platforms there are functions that return better sequences of pseudo-random values, one advantage of rand() is that it's specified in the ANSI/ISO C standard. Thus, code that uses it will be more portable than code that uses alternatives such as random() or members of the rand48 family such as lrand48() or drand48().

Even though ANSI C requires the existence of a function rand() that generates sequences of pseudo-random integers, it leaves the particular algorithm to be used unspecified. However, it does give the following portable sample implementation using a linear congruential generator:

static unsigned long int next = 1;

int rand(void) // RAND_MAX assumed to be 32767
{
      next = next * 1103515245 + 12345;
      return (unsigned int)(next/65536) % 32768;
}

Programming languages - C, ANSI/ISO/IEC 9899-1999, pg 311

Note that the above code sample conforms to the requirement that RAND_MAX is at least 32767, and that the default random seed is 1.

GNU C Library

In the GNU C Library, both rand() and random() share the same underlying implementation. This can be seen on line 28 in glibc-2.3.6/stdlib/rand.c (or, alternatively "man 3 rand" on a Linux box should work, too). By default, the implementation uses a linear feedback shift register with 128 bytes of state information to generate random integers ranging from 0 to a RAND_MAX of 2³¹-1. The amount of state information used can be modified by calling initstate(), and is allowed to range from 8 to 256 bytes. However, if fewer than 32 bytes are used, the code back-offs to using a linear congruential generator.

The generator operates by having two integer pointers into an array of state information. Each time a random number is requested, the corresponding values in the state array are summed. The result of the summation is then stored in the state array, replacing the value pointed to by one of the pointers. This new value is also subsequently used as the random value generated by the algorithm. However, since the new value is a 32-bit integer and only 31-bits are needed in order to get a random number in the desired range, the least significant bit is dropped in order to get the final random value returned by the routine. After generating and storing the new random value, both of the state pointers are incremented so that they point to subsequent locations in the state array. If either pointer reaches the end of the array, it is reset to point to the beginning of it.

Below is the snippet of the code that gets executed to generate the successive random values:

val = *fptr += *rptr;
/* Chucking least random bit.  */
*result = (val >> 1) & 0x7fffffff;
++fptr;
if (fptr >= end_ptr)
{
   fptr = state;
   ++rptr;
}
else
{
   ++rptr;
   if (rptr >= end_ptr)
   rptr = state;
}

From the function __random_r() in glibc-2.3.5/stdlib/random_r.c, lines 391-405

Above, fptr and rptr are pointers to 32-bit integers inside the state array. The pointer result is used to store the random value that will be returned by function. The initial state array is populated with values generated using a linear congruential random number generator. The pointer rptr is initialized to point to the beginning of the state array, while the initial location pointed to by fptr varies according to the size of the state array (for the default state array size of 128 bytes, &state[3] is used).

Recall that the period of a random number generator is the number of random values that it generates before the sequence repeats itself¹. For this generator, the size of the period will depend on the amount of state information used. According to the Linux man page for random(), when using the default of 128 bytes of state information, the period is approximately 16*(2³¹-1).

Java

In Java, the class Random from the package java.util can be used to generated pseudo-random values for a variety of data types, specifically, boolean, byte, int, long, float, and double. For doubles, random values can be generated along both a uniform distribution in the range [0,1)², and a Gaussian distribution with mean 0.0 and standard deviation 1.0. For integers, random values are uniformly distributed across all the values that can be represented using a 32-bit signed integer, [-2³¹,2³¹-1].

Internally, all random values generated by objects of this class are produced by calling the method next(int bits) that returns an integer with the requested number of random bits. This value is then used to generate the various type specific random values. The implementation is convenient as it allows subclasses of Random to simply replace this method with one that implements an alternative pseudo-random number generator and then all of the other type specific methods will generated random values using the new algorithm.

To generate random values, next() uses the following linear congruential pseudo-random number generator.

synchronized protected int next(int bits) {
       seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
       return (int)(seed >>> (48 - bits));
}

Java 5, J2SE API documentation - Random.next()

In an alternate notion, the recurrence for this generator is then:

r_i+1 = 25,214,903,917*r_i + 11 modulus 2⁴⁸

Note that, even though seed is stored as a long, Java's signed 64-bit integer type, only 48 bits of state information are stored in it. Further, the constant multiplier, 0x5DEECE66D, and increment, 0xB, are identical to those used for the popular rand48 family of linear congruential random number generators found on many Unix/Unix-like systems. Additionally, a library routine from this family of generators, drand48(), is also Perl's preferred generator on platforms that support it.

The given recurrence has a period length of 2⁴⁸ (this is order 10¹⁴). As such, since the modulus is also 2⁴⁸, the underlying generator can be said to have a full period.

Perl

In Perl, real valued uniformly distributed pseudo-random numbers can be obtained by calling the function rand(). This function takes one argument that specifies the upper limit on the range of random values. That is, rand(n) will return a real valued random number in the range [0,n). This value can then be passed to int() if random integer values are desired. Rather than rounding to the nearest integer, int() just removes the decimal portion of it's argument. This truncation is actually desirable in this case, since it results in random integer values that are still uniformly distributed in the range [0,n). If rounding was performed, '0' would be less probably than all of the other integers in the original range, and the range would change to [0,n] (i.e. the ranges now includes the end point n).

Perl generates random numbers using the library routines provided by the platform it has been complied for. Specifically, Perl's first choice is to use drand48(). However, if it is not available, it will try to use random(). If both of these are unavailable, it will settle for rand(). Perl's configuration script has somewhat amusing commentary on its preferences regarding the three routines:

  if set drand48 val -f; eval $csym; $val; then
    dflt="drand48"
    echo "Good, found drand48()." >&4
  elif set random val -f; eval $csym; $val; then
    dflt="random"
    echo "OK, found random()." >&4
  else
    dflt="rand"
    echo "Yick, looks like I have to use rand()." >&4
  fi

From perl-5.8.7/Configure, lines 17555-17564

Python

In python, random numbers can be generated using the module random. This module not only provides routines for generating uniformly distributed integer and real valued pseudo-random numbers, but also provides routines for real valued numbers distributed along a sizable number of different distributions (e.g. beta, exponential, gamma, Gaussian, Pareto, and Weibull distributions). Additionally, there are routines for shuffling sequences and sampling from a collection.

The module is implemented using a hidden object instantiated from the class random.Random. The functions in the module simply invoke the corresponding methods in this object. The Random class itself is organization in a manner similar to Java's Random class where by, if you recall, all the methods for generating random values depend on the method next(). In Python's case the core method's name is random(), and it generates real valued numbers in the range [0,1). As with Java, to replace the pseudo-random number generator used by all the methods in the class, one can simply subclass Random and supply a new implementation of the core generator, random().

Starting with CPython 2.3, the method random() uses a Mersenne Twister pseudo-random number generator. This is a fairly new technique that has been shown to result in generators with impressive periods and otherwise good sequences of pseudo-random numbers. More concretely, for the set of parameters used in the variant/implementation known as MT199937, the generator has an enormous period of 2¹⁹⁹³⁷-1 (this is on the order of 10⁶⁰⁰¹). The approach and the MT19937 variant were introduced in the following paper by Makoko Matsumoto and Takuji Nishimura.

Matsumoto, M. and Nishimura, T. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modeling and Computer Simulation, 8 (1998) pg. 3-30. [PDF]

Matsumoto and Nishimura also made a C-code implementation of MT19937 available under a fairly liberal open source license. With a few mirror changes in order to integrate the code into CPython, it is the 2002/1/26 release of this MT19937 implementation that acts as the underlying generator for the random() method (see Python-2.4.2/Modules/_randommodule.c). The original unmodified code is available for download from the official Mersenne Twister Home Page

Ruby

Pseudo-random numbers can be generated in Ruby by the Kernel method rand(). This routine takes one argument dubbed max. If the magnitude of max is greater than or equal to 1.0, then rand() returns an integer uniformly distributed in the range [0,max.to_i.abs]. In this context, like Perl's int() routine, the method to_i() returns just the integer portion of a real valued number without any rounding and the method abs() simply takes the absolute value of the resulting integer. However, if the magnitude of max is less than 1.0 but not 0.0, rand() returns a real valued number uniformly distributed in the range [0,max.abs). If max is zero, the default value if no parameter is explicitly supplied, rand() returns a real valued number in the range [0,1).

Like python, Ruby uses a Mersenne Twister pseudo-random number generator. In fact, it also uses the MT19937 C-code released by Matsumoto and Nishimura (see ruby-1.8.3/random.c). However, Ruby uses a variant released on 2002/2/10 that is potentially faster on some platforms, but it has been reported to be slower on Pentium 4 systems. On the Mersenne Twister homepage this release is listed as a "version considering Shawn Cokus's code", and dated 2002/2/11.

Octave

Pseudo-random real valued numbers can generated in Octave using the functions rand() and randn(). Additionally, a row vector with a random permutation of integers can be generated using randperm(). If the routines rand() and randn() are called without any arguments, they both produce a single random value. However, if they are called with a single numeric argument, n, they will generate a square n-by-n matrix of random values. If they are called with two arguments, say n and m, they will generate a n-by-m matrix of random values. The two routines differ in that randn() generates values that have a Gaussian distributed with mean 0 & standard deviation 1, while rand() generates values uniformly distributed in the range (0,1) (see: octave-2.1.72/libcruft/ranlib/genunf.f). Note that the latter differs from the other similar routines previously presented that generate values in the range [0,1)².

The values returned by rand() and randn() are generated using routines from Ranlib, a library of C and Fortran routines for generating pseudo-random numbers. The package is maintained by Barry Brown of the Department of Biostatistics & Applied Mathematics at the University of Texas, MD Anderson Cancer Center, and can be downloaded from the department's software repository.

The package generates random numbers by combining the output of two multiplicative linear congruential generators. Doing so results in an aggregate generator that has a period of 2.3 x 10 ¹⁸. The implementation of this generator is based on a Fortran 77 port of the Pascal source code that appears in the appendix of the following paper:

L'Ecuyer, P. and Cote, S. "Implementing a Random Number Package with Splitting Facilities." ACM Transactions on Mathematical Software, 17:98-111 (1991) [PDF]

Note that, this paper focuses on providing simultaneous access to multiple generators each of which that has its output organized into a series of substreams/blocks. Generators can be maneuvered among and within the blocks through instructions to transition to the next block, restarted within the current block, and restart at it's first block. While this functionality is provided by Ranlib, it does not appear to be used by Octave.

Scheme

Pseudo-random numbers can be generated in Scheme using the procedure random. This procedure takes one argument, named modulus, and returns values uniformly distributed in the range [0,modulus). If the specified modulus is of type integer, the routine will return integer random values. Similarly, if a real valued modulus is used, the routine will return a real valued random number.

Implementation in Guile

In Guile, an implementation of Scheme targeted as being an "extension language" (/embedded scripting language) for other applications, random numbers are generated using a Multiply with Carry generator. This class of generators was first proposed in a Usenet post by George Marsaglia in sci.stat.math titled "Yet another RNG" on August 1, 1994. At the time, Marsaglia was so excited about this class of generators that he informally dubbed them "The Mother of All Random Number Generators" due to the long periods that could be generated using a relatively simple and efficient algorithm.

Multiple-with-Carry (MWC) generators produce values using the following recurrence:

r_n+1 = a * r_n + carry_n mod M

Here carry can be defined as:

carry_n+1 = int((a * r_n + carry_n) / M)

However, since MWC generators use a modulus, M, that is a power of two, r and carry are really just the low and high order bits, respectively, of the same underlying recurrence.

Within Guile, it's MWC generator is implemented by the following code snippet. A variant of this generator is also provided for systems that don't support 64-bit integer types. While slightly more complex, the alternate generator still implements the same underlying computation.

unsigned long
scm_i_uniform32 (scm_t_i_rstate *state)
{
  LONG64 x = (LONG64) A * state->w + state->c;
  LONG32 w = x & 0xffffffffUL;
  state->w = w;
  state->c = x >> 32L;
  return w;
}

From guile-1.6.7/libguile/random.c, lines 114-122

In the code above, the basic recurrence is computed and then stored in a 64-bit temporary variable x. The high order bits are then stored in the carry, state->c, and the low order bits are stored in state->w and then returned as the random value produced by the generator.

The generator has a period of approximately 4.6 x 10¹⁸, or alternatively about 2⁶²;

Bash Shell Script

Pseudo-random integer values can be obtained in Bash by retrieving values from the special shell variable RANDOM. Each time a value is read from this variable, Bash generates a random integer (semi-)uniformly distributed in the range [0,32767]. If a value is assigned to RANDOM, the value is used as a new random seed for the generator. The variable RANDOM can also be used to generate integer values in Z shell (zsh) and Korn Shell (ksh). However, it isn't support by the original Bourne shell (sh).

Retrieving values from RANDOM in bash invokes the following linear congruential pseudo-random number generator. Here, rseed is an unsigned long that is initialized to 1, but then later set to the sum of the current process id and the shell start time.

/* Returns a pseudo-random number between 0 and 32767. */
static int
brand ()
{
  rseed = rseed * 1103515245 + 12345;
  return ((unsigned int)((rseed >> 16) & 32767)); /* was % 32768 */
}

From bash-3.1/variables.c lines 1146-1152

If you recall, this is the exact same generator given in the ANSI/ISO C Standard as the example implementation of the rand() function. The code snippets only differ in the manner that they remove the 16 low order bits from the state variable (rseed/next) and then ensure that the resulting integer is in the range [0,32767] across all platforms. Specifically, in Bash, this is done using universally fast bit manipulation operators, while, in the Standard C example code, this is done using division and modulus operations; i.e. ((rseed >> 16) & 32767) vs. ((next/65536) % 32768). However, it is interesting to note that the GNU C compiler and presumably other modern C compilers actually compile both variants of these two operations to the same pair of processor instructions:

  sar eax, 16       ; shift-16/divide-65536 operation
  and eax, 32767    ; and-32767/modulus-32768

Generated using gcc version 3.4.4, options "-masm=intel -S"

Other Sources of Entropy

In some domains, such as cryptography, the pseudo-random number generators discussed above may not be entirely appropriate due to their deterministic nature. That is, given a known initial state, such generators will always produce the exact same sequence of random values. On Unix, Linux and other Unix-like platforms an alternative source of entropy is the special file /dev/random. Values read from this file represent random numbers generated by the operating system using various system events to create a pool of entropy. Since such systems events can be based on random events in the external environment, the generated values are much more like true random numbers. Additionally, on Microsoft Windows the routine CryptGenRandom can be used for generating random values targeted for use in cryptography. In contrast to /dev/random, this routine doesn't appear to automatically use random system events to generate random values. However, it does allow the calling routine to supply a buffer containing an "auxiliary random seed" that can be filled with random environmental noise that has been collected by the calling application.

Implementation Notes

The sample code given below for generating random numbers takes two parameters, one specifying the type of random value to be generated ("int" or "real") and the other indicating how many random values should be produced. The random seed is left to whatever the default is for the routines used in the respective languages. This means that for the C, C++, and (Guile interpreted) Scheme implementations the sequence of random values will be identical across separate invocations, while for the other implementations the sequence will differ.

Footnotes:

see the description of period included with the Randu post from last week.
[0,1) Indicates that the range is includes '0' but excludes '1'. In contrast, (0,1) indicates that the range excludes both '0' and '1'

Languages: C, C++, Java, Perl, Python, Ruby, Octave(/Matlab like), Scheme, and Bash shell script.