This post is an inquiry into some of the drawbacks with using the random()
function to generate all your random values, and discusses the circumstances in
which the Beta distribution might prove a compelling alternative. It is written
with procedural games in mind, there is no expectation that the reader know what
a ‘distribution’ is, and all of the code examples are written in Python.
The Shortcomings of the Uniform Distribution
Most of the time I see tutorials on generating procedural content, whenever a
random value is needed a call is made to the default generator (in Python, this
is probably the one from the random
module). So if we want to have a random
number of chests in a room, you might see something like
import random as r
r.seed(101)
r.randint(1, 3)
## 2
Which gives a number of chests in {1, 2, 3}, with an equal chance of each. Or, we might trigger an event with 90% probability with something like
r.seed(101)
if r.random() >= 0.1:
print 'Event happened.'
else:
print 'Event did not happen.'
## Event happened.
In a setting where we might want to have an enemy with a variable amount of health, you could see something like
r.seed(102)
monster_health = 3000 + r.randint(1000, 2000)
## monster_health = 4148
But inevitably, these examples are constructed to use random values from
uniform distributions. That is, every integer in the range of a
to b
has
an equal chance of being output by the randint(a,b)
function, and the mean
value is exactly a + b / 2
. So, in the example above, we can expect the
average monster to have about 4500 health, and expect to see monsters with
4000-4500 health exactly as often as those with 4500-5000. But what if you
wanted most monsters to have a health value close to 4500, and only a few with
low values near 4000 or high values near 5000? This becomes a very difficult
thing to do with only the uniformly-distributed r.random()
and r.randint()
functions very quickly.
Maybe the Normal Distribution is the Answer?
If you’ve ever been exposed to any introductory statistics, you’ve probably heard of the normal (or Gaussian) distribution, which might get us exactly what we want. A random function that gets its values from a normal distribution has most values clustered close to a specified mean, with a specified spread (or variance). It is quite easy to ensure that, for example, 99.7% of the values will fall within the range 4000-5000, with 68% of them falling within the range of about 4350-4650. Sounds great, right?
r.seed(103)
monster_health_list = list()
for i in xrange(0, 10):
monster_health_list.append(int(r.gauss(mu = 4500, sigma = 500 / 3)))
print monster_health
## [4681, 4475, 4447, 4234, 4514, 4565, 4432, 4449, 4242, 4735]
The reason the sigma
parameter is 500 / 3
in this case is because for normal
distributions, 99.7% of values far within three ‘sigmas’ of the mean (which is
called mu
).
So far, so good. However, normal distributions have the inconvenient property of returning any value on occasion. It would be quite possible (although improbable) for a player to run into a monster with negative health, or a health value of 50,000. So if you’re going to use the normal distribution, make sure to have some checks for these unlikely cases.
Yet a normal distribution may leave some things to be desired. Say we wanted monsters to have health values in the range 4000-4500 more often than 4500-5000? Or we wanted most monsters to have health values in the low 4000s or high 4000s, but very few in between? If the latter scenario doesn’t seem that desirable, consider an analogous problem: we want to generate a random number of monsters for each room in our dungeon, but want most rooms to be either empty or relatively full – with few ‘easy’ rooms in between.
All of these problems can be solved using the remarkable Beta distribution.
The Beta Distribution and Why You Should Use It
The Beta distribution takes two parameters – alpha
and beta
– and outputs
values between 0 and 1. This means we don’t need to worry about occasionally
producing extremely large or small values. Also, the mean is always alpha / (alpha + beta)
, so it is easy to figure out what the average value will be.
What makes the Beta distribution interesting is that the two parameters give it a wide variety of ‘shapes’. It is because of the flexibility of the Beta distribution to take on different ‘shapes’ that it can be so useful:
-
When
alpha = beta = 1
, the Beta distribution is identical to the uniform distribution. That is, it will produce the same values as therandom()
function. -
When
alpha = beta
, the mean is exactly 0.5, and there is a equal chance that values will fall above and below the mean. Asalpha
andbeta
grow larger, values move closer and closer to the mean. -
When
alpha > beta
, more values will be generated below the mean, and vice versa. -
When
alpha < 1
and/orbeta < 1
, more values will be generated near 0 and 1 than near the mean. Asalpha
orbeta
gets closer and closer to zero, it becomes more and more unlikely that values away from 0 and 1 will be generated.
The Beta distribution is extremely popular in some fields of (social) science,
and so it is no surprise that the standard library has an implementation in the
random
module, the function betavariate(alpha, beta)
.
The problem I proposed with filling rooms in a dungeon with monsters could be solved with code as simple as the following:
# A dungeon has ten rooms, with up to 50 monsters in each.
# We want most rooms to have a large number of monsters in
# them, or be almost empty.
r.seed(107)
monsters_in_rooms = list()
for i in xrange(0, 10):
monsters_in_rooms.append(int(r.betavariate(0.1, 0.1) * 50))
print monsters_in_rooms
## [0, 49, 45, 49, 5, 49, 0, 48, 0, 25]
Notice that since betavariate()
returns values between 0 and 1, I simply
multiplied by the number I actually wanted. The parameters are alpha = beta = 0.1
.
To get more monsters with 4000-4250 health than 4250-5000 health, try:
# Monsters have an average of 4500 health. With alpha = 1
# and beta = 3, the betavariate() will return values with
# a mean of 0.25. So we multiply by 1000, and add 4000.
r.seed(105)
health_list = list()
for i in xrange(0, 15):
health_list.append(int(r.betavariate(1, 3) * 1000) + 4000)
print health
## [4054, 4023, 4430, 4172, 4415, 4253, 4045, 4135, 4056, 4035, 4005, 4172, 4072, 4296, 4395]
print sum(health) / 15
## 4170
If you wanted values closer to 4250
itself, simply increase alpha
and beta
without changing their ratio.
And finally, to get something like the normal distribution solution we had earlier, try:
# Monsters have an average of 4500 health, with a range of
# 4000-5000. So we multiply by 1000, and add 4000.
r.seed(115)
health_list = list()
for i in xrange(0, 10):
health_list.append(int(r.betavariate(3, 3) * 1000) + 4000)
print health_list
## [4194, 4571, 4162, 4334, 4564, 4551, 4324, 4363, 4450, 4193]
print sum(health_list) / 15
## 4370
In a future post I will show how to use the Dirichlet distribution to solve another kind problem that shows up in procedural content generation.