Difference between revisions of "Random number generator"

From RogueBasin
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 8: Line 8:
Typical PRNG algorithms start with an initial ''seed''. Each random value is generated by running the last generated value through a special algorithm, starting with the seed. For example, if the seed is 5 and the algorithm is ''f''(''n'') = 3''n'' + 1 mod 10, we generate the sequence 5 6 9 8 5 6 9 8... (This is an extremely poor PRNG, since it repeats every four values and does not generate all the numbers mod 10.)
Typical PRNG algorithms start with an initial ''seed''. Each random value is generated by running the last generated value through a special algorithm, starting with the seed. For example, if the seed is 5 and the algorithm is ''f''(''n'') = 3''n'' + 1 mod 10, we generate the sequence 5 6 9 8 5 6 9 8... (This is an extremely poor PRNG, since it repeats every four values and does not generate all the numbers mod 10.)


Some PRNGs produce integers, while others produce floating-points. Many roguelikes that use D&D-style systems use integers for die rolls; conversion floats to integers is often done by multiplying and applying the floor function.
Some PRNGs produce integers, while others produce floating-points. Many roguelikes that use D&D-style systems use integers for die rolls; converting floats to integers is often done by multiplying and applying the floor function.


The above PRNG example forms a repeating sequence, as do all PRNGs. Since it repeats every four values, we say that it has period 4. A good PRNG has a very large period, so the values will not repeat for a long time. One CMWC (complementary multiply with carry) generator has a period of approximately 10<sup>13101</sup>!
The above PRNG example forms a repeating sequence, as do all PRNGs. Since it repeats every four values, we say that it has period 4. A good PRNG has a very large period, so the values will not repeat for a long time. One CMWC (complementary multiply with carry) generator has a period of approximately 10<sup>13101</sup>!


If an eavesdropper, given a sequence of outputs from a certain PRNG, cannot determine the next value, then the RNG is called ''cryptographically secure''. Cryptographically secure PRNGs do exist, but most are slow and suitable only for cryptography. Fortunately, random number security is not a major concern in nearly all roguelikes.
If an eavesdropper, given a sequence of outputs from a certain PRNG, cannot determine the next value, then the RNG is called ''cryptographically secure''. Cryptographically secure PRNGs do exist, but most are slow and suitable only for cryptography. Fortunately, random number security is irrelevant for the vast majority of roguelikes.


== Seeding ==
== Seeding ==

Latest revision as of 03:25, 6 August 2018

A random number generator, or RNG for short, is a method of generating numerical values that are unpredictable and lacking in any sort of pattern. In game development, accessing "true" randomness is inconvenient at best, so programmers resort to using pseudo-random number generators, or PRNGs, which use specially crafted mathematical algorithms to allow computers to simulate randomness. PRNGs are never truly random, but they are unpredictable enough for practical purposes.

Roguelike games often use PRNGs to compute dice rolls and other situations that require random generation. Some RNG algorithms evaluate simple polynomials, while others use techniques derived from fractals or chaos theory.

Many players jokingly refer to the RNG as the Random Number God, the entity inside the game that provides random functionality. This deity seems to leave great items and easy monsters for some players, while granting seemingly unfair deaths to others. Some players superstitiously avoid offending the Random Number God.

Algorithms

Typical PRNG algorithms start with an initial seed. Each random value is generated by running the last generated value through a special algorithm, starting with the seed. For example, if the seed is 5 and the algorithm is f(n) = 3n + 1 mod 10, we generate the sequence 5 6 9 8 5 6 9 8... (This is an extremely poor PRNG, since it repeats every four values and does not generate all the numbers mod 10.)

Some PRNGs produce integers, while others produce floating-points. Many roguelikes that use D&D-style systems use integers for die rolls; converting floats to integers is often done by multiplying and applying the floor function.

The above PRNG example forms a repeating sequence, as do all PRNGs. Since it repeats every four values, we say that it has period 4. A good PRNG has a very large period, so the values will not repeat for a long time. One CMWC (complementary multiply with carry) generator has a period of approximately 1013101!

If an eavesdropper, given a sequence of outputs from a certain PRNG, cannot determine the next value, then the RNG is called cryptographically secure. Cryptographically secure PRNGs do exist, but most are slow and suitable only for cryptography. Fortunately, random number security is irrelevant for the vast majority of roguelikes.

Seeding

When using a PRNG, one must be careful with seeding. Since the same seeds will produce the same string of values, if every game uses the same seed, then it could roll the same character and start with the same map every time! The simplest and most common way to ensure proper seeding is to use the current time. One can also use player behavior as a source of randomness, but the programmer must be careful to keep the player from directly controlling the game's randomness.

Desired features

Pseudorandom number generators, in order to be considered "good", must offer the following:

Speed

PRNG algorithms have different calculation speeds. Linear congruential generators (generators of the form f(n) = an + b mod m) are currently the fastest generators that exhibit decent randomness. In general, however, speed is usually at a loss of quality.

Uniformity

Unfortunately, not all algorithms offer a uniform distribution. For instance, if we had a random number generator that returned values from 0 to 3, we cannot get random numbers from 0 to 2 by computing the latter's output modulo 3, or the number 0 would appear twice as often as 1 or 2!

Periodicity

Generally, the period of a PRNG should be maximal.

PRNGs in programming languages

Most programming environments provide their own PRNG. In many of these cases, the programmer cannot control the type of generator used, which may be undesirable in certain situations. Fortunately, there are a handful of PRNGs out there that are relatively easy to hand-code, although they are not provided here.

C and C++

In C and C++, the random number generator is the rand() function. One sets the seed with srand(). Some call these the "ANSI C" (or equivalently "ISO C") functions because they are part of the C standard and to distinguish them from other RNGs that some systems provide.

The rand() function returns an int in the range from 0 to RAND_MAX, a platform-dependent value. Here is a simple program to show one random value:

#include <stdlib.h> /* rand, srand */
#include <stdio.h>  /* printf (for this example) */
#include <time.h>   /* time */

int main() {
  srand( time(NULL) );
  printf("%d\n", rand());
  return 0;
}

Many programmers like to seed srand() with the return value of time(), as we do above. In fact, if we run this program multiple times within one second, it may print the same number again. (Some will write "srand( time(0) )", which is considered bad by some because the time function takes a pointer. Using NULL reminds you that it is a pointer. However, using 0 does the same thing, because NULL is almost always #defined to be 0. (See http://www.lysator.liu.se/c/c-faq/c-1.html#1-3.) If you want, you can add a cast to unsigned int. Adding it is necessary to satisfy lint.)

Now we come to an important note: Many platforms have poor-quality versions of the rand() function. Above GNU platforms, rand() and srand() work relatively well, and The GNU C Library Reference Manual recommends their use. If your favorite platform is GNU or Linux, then you could program with rand() and srand() and have your game at least working above other C platforms. But the most popular roguelike games support many platforms well, and their developers avoid rand() when they can.

BSD platforms implement rand() and srand() in terms of rand_r(), which limits the state of the RNG to 32 bits (implying a period of 2**31-1 or less). BSD calls this a "bad random number generator", while GNU states that 32 bits is "far too few to provide a good RNG."

Meanwhile, though RAND_MAX is platform dependent, BSD and GNU use 2,147,483,647, the largest possible signed int. One should be warned that on MSVC, RAND_MAX is 32,767, which may be a lot smaller than you expect. (All of the C and C++ standards, through C1X and C++0X, merely require that RAND_MAX be at least 32,767.)

If in doubt, you should implement a new RNG. However, you should not try to create your own algorithm, as it would almost certainly be even worse than the one you're trying to replace! Many people think the Mersenne twister is best, and there are existing implementations of it.

A very good random number generator for C/C++ is RandomLib (I am not the author, and I have tested this library and found it excellent) at http://sourceforge.net/projects/randomlib/ . It is LGPL and supports a variety of algorithms, including Mersenne Twister.

The newest C++11 standard features standard random number facilities.

Java

The Random class in the java.util package is used to generate random numbers. The constructor, new Random(), will seed that instance with a value based on the current time. The method random.nextInt(n) will generate a random number from 0 to n - 1 inclusive. This is preferable to using nextDouble() as it will returns a double value between 0 and 1, in which rounding error may occur.

Example:

import java.util.Random;
...
private Random random = new Random();
...
//Tabletop style randomness
//2d6 - dice(2,6)
public int dice(int number, int sides) {
   int total = 0;
   for(int i = 0; i < number; i++)
      total += random.nextInt(sides) + 1;
   return total;
}

Note that the Random class is a very poor RNG, especially if it is needed to generate large amounts of numbers. An alternative is SquidLib, which provides several RNG classes tailored to different settings. Some of these classes include one with low memory usage, one that can periodically switch its seed (useful in preventing a series of actions from being predictable), and more.

Python

Python includes a very handy module for generating random numbers, namely random:

#!/usr/bin/env python2.7

import random

# Float in the range (0, 1]
print random.random()

# Float in the range [a, b]
print random.randint(1, 10)

# Random choice from a list
print random.choice(["Foo", "Bar", "Baz"])

# Shuffling a list
a = [1, 2, 3, 4, 5]
random.shuffle(a)
print a

# Standard normal
random.gauss(0, 1)

For more information, see the documentation for the module.

Criticisms

Pseudorandom number generators, or PRNGs, struggle against difficulties in generating numbers in as few CPU ticks as possible, while retaining a decent periodicity and quality. George Marsaglia created an application used for testing the quality of PRNG algorithms, called Diehard. He then brought up criticisms against many commonly used algorithms, mainly the Multiplicative Congruential Generator (used in most rand() function implementations). His comments can be found, for instance, on comp.lang.c Usenet group.

As Marsaglia has proven, MCGs create non-uniform pseudorandom numbers, falling into parallel hyperplanes. One of the examples of this undesired behaviour is the infamous RANDU. Additionally, most PRNGs of this type are predictable, as it is enough to observe 2 or 3 subsequently generated numbers to discover all the others. Also, this algorithm group has an undesired tendency to offer extremely short periods for lower order bits of the generated numbers (for instance, the last bit often has a period of 20: it simply alternates between 0 and 1).

The Mersenne twister (MT) was also criticised for being difficult to implement, even though it generates fast and high quality numbers. It also stores the last 624 generated numbers. Knowing this sequence can reveal all future iterates. Dungeon Crawl Stone Soup abandoned MT after a player demonstrated the ability to divine the generator state and cheat.

A PRNG's periodicity is often subject to criticisms as well. While congruential generators have a periodicity at most equal to 232 on most 32 bit platforms (although there are examples of better periodicity: Java implements such a generator with a period of 248, and Donald Knuth's MMIX implementation has a period of 264), they often offer a ridiculously low RAND_MAX value, shortening the period to 215 (for instance, MSVC). Such a low periodicity is unacceptable in many cases.

PRNG algorithms

Several PRNG algorithms have been devised. Here is a brief list:

Mersenne twister

A recently introduced family of PRNGs is known as the Mersenne twister. The most popular Mersenne twister is known as MT19937, which produces fast and high-quality random numbers. It has a long period of exactly 219937 - 1, a Mersenne prime from which the algorithm derives its name. See the official site for more info about it.

MT19937 is currently the default in PRNG Python's random module.

A new and improved variant of the Mersenne twister from Hiroshima University can be found at http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/

The Mersenne twister algorithm is not provided here due to its complexity.

Linear congruential generator

Linear congruential generators, or LCGs, are extremely fast and simple to implement, but they have received much criticism for producing poor-quality and easily predictable values. Still, they can be useful when resources are limited and security is not an issue.

An LCG has three constant values: the modulo m, the multiplicative term a, and the additive term b. The iteration function is f(x) = ax + b mod m.

LCGs are very picky about the values m, a, and b; the programmer must make careful choices to obtain a maximal period (that is, m.) The case b = 0 is called a Lehmer generator or a Park-Miller generator.

Below is an example C99/C++11 implementation.

#include <stdint.h>

struct linear_congruential_data
{
  uint32_t seed;
  uint32_t m; /* modulo */
  uint32_t a; /* multiplicative term */
  uint32_t b; /* additive term */
}

/* implements seed' = a*seed+b mod m without incurring unsigned wraparound */
inline struct linear_congruential_data iterLCG(struct linear_congruential_data src)
{
  uint64_t newseed = src.seed;
  newseed *= src.a;
  newseed += src.b;
  src.seed = newseed % src.m;
  return src;
}

The most well-known LCGs are shown below. All three are Lehmer generators.

Name m a b
MINSTD 2147483647 16807 0
MINSTD2 2147483647 48271 0
MINSTD3 2147483647 69621 0

Multiply-with-carry

Multiply-with-carry, or MWC, is a family of PRNGs devised by George Marsaglia. MWCs are known for being very fast and simple, along with producing high-quality values with very long periods (a lag-256 MWC would have a period of 28222.)

An MWC has the following constants (all nonnegative integers):

  • lag r
  • base b
  • multiplier a
  • seeds x0, x1, x2, ..., xr - 1 < b
  • initial carry cr - 1 < a

The iteration function is xn = (axn - r + cn - 1) mod b, where cn = ⌊(axn - r + cn - 1)/b⌋.

When using an MWC, it is only necessary to remember the last r values. Any old values can be erased.

Complementary multiply-with-carry

If we take the MWC algorithm and modify the iteration function to xn = (b - 1) - ((axn - r + cn - 1) mod b), we get an MWC variant known as the complementary multiply-with-carry (CMWC.) They solve a slight bias in he original MWC.

CMWCs require slightly more calculation than MWCs, but they have many more advantages.

The CMWC4096, described at http://school.anhb.uwa.edu.au/personalpages/kwessen/shared/Marsaglia03.html, has a near-record period of 2131104. It is informally known as the Mother of All PRNGs, and has the following parameters:

  • r = 4096
  • b = 232
  • a = 18782

In libtcod, CMWC4096 replaced MT19937 as the default PRNG.

Generalised Feedback Shift Register

Also known as GFSR, it is a very fast generator with good randomness and relatively high periods. The basic concept is the following:

rnn = rnn-A XOR rnn-B XOR rnn-C XOR rnn-D

In this particular case (K=4), the period is ~21000.