Using More Interesting Random Values in Procedural Content

This post is an inquiry into some of the drawbacks with using the random() function to generate all your random values, and discusses the circumstances in which the Beta distribution might prove a compelling alternative. It is written with procedural games in mind, there is no expectation that the reader know what a ‘distribution’ is, and all of the code examples are written in Python.

The Shortcomings of the Uniform Distribution

Most of the time I see tutorials on generating procedural content, whenever a random value is needed a call is made to the default generator (in Python, this is probably the one from the random module). So if we want to have a random number of chests in a room, you might see something like

import random as r
r.seed(101)
r.randint(1, 3)
## 2

Which gives a number of chests in {1, 2, 3}, with an equal chance of each. Or, we might trigger an event with 90% probability with something like

r.seed(101)
if r.random() >= 0.1:
	print 'Event happened.'
else:
	print 'Event did not happen.'
## Event happened.

In a setting where we might want to have an enemy with a variable amount of health, you could see something like

r.seed(102)
monster_health = 3000 + r.randint(1000, 2000)
## monster_health = 4148

But inevitably, these examples are constructed to use random values from uniform distributions. That is, every integer in the range of a to b has an equal chance of being output by the randint(a,b) function, and the mean value is exactly a + b / 2. So, in the example above, we can expect the average monster to have about 4500 health, and expect to see monsters with 4000-4500 health exactly as often as those with 4500-5000. But what if you wanted most monsters to have a health value close to 4500, and only a few with low values near 4000 or high values near 5000? This becomes a very difficult thing to do with only the uniformly-distributed r.random() and r.randint() functions very quickly.

Maybe the Normal Distribution is the Answer?

If you’ve ever been exposed to any introductory statistics, you’ve probably heard of the normal (or Gaussian) distribution, which might get us exactly what we want. A random function that gets its values from a normal distribution has most values clustered close to a specified mean, with a specified spread (or variance). It is quite easy to ensure that, for example, 99.7% of the values will fall within the range 4000-5000, with 68% of them falling within the range of about 4350-4650. Sounds great, right?

r.seed(103)
monster_health_list = list()
for i in xrange(0, 10):
	monster_health_list.append(int(r.gauss(mu = 4500, sigma = 500 / 3)))
print monster_health
## [4681, 4475, 4447, 4234, 4514, 4565, 4432, 4449, 4242, 4735]

The reason the sigma parameter is 500 / 3 in this case is because for normal distributions, 99.7% of values far within three ‘sigmas’ of the mean (which is called mu).

So far, so good. However, normal distributions have the inconvenient property of returning any value on occasion. It would be quite possible (although improbable) for a player to run into a monster with negative health, or a health value of 50,000. So if you’re going to use the normal distribution, make sure to have some checks for these unlikely cases.

Yet a normal distribution may leave some things to be desired. Say we wanted monsters to have health values in the range 4000-4500 more often than 4500-5000? Or we wanted most monsters to have health values in the low 4000s or high 4000s, but very few in between? If the latter scenario doesn’t seem that desirable, consider an analogous problem: we want to generate a random number of monsters for each room in our dungeon, but want most rooms to be either empty or relatively full – with few ‘easy’ rooms in between.

All of these problems can be solved using the remarkable Beta distribution.

The Beta Distribution and Why You Should Use It

The Beta distribution takes two parameters – alpha and beta – and outputs values between 0 and 1. This means we don’t need to worry about occasionally producing extremely large or small values. Also, the mean is always alpha / (alpha + beta), so it is easy to figure out what the average value will be.

What makes the Beta distribution interesting is that the two parameters give it a wide variety of ‘shapes’. It is because of the flexibility of the Beta distribution to take on different ‘shapes’ that it can be so useful:

When alpha = beta = 1, the Beta distribution is identical to the uniform distribution. That is, it will produce the same values as the random() function.
When alpha = beta, the mean is exactly 0.5, and there is a equal chance that values will fall above and below the mean. As alpha and beta grow larger, values move closer and closer to the mean.
When alpha > beta, more values will be generated below the mean, and vice versa.
When alpha < 1 and/or beta < 1, more values will be generated near 0 and 1 than near the mean. As alpha or beta gets closer and closer to zero, it becomes more and more unlikely that values away from 0 and 1 will be generated.

The Beta distribution is extremely popular in some fields of (social) science, and so it is no surprise that the standard library has an implementation in the random module, the function betavariate(alpha, beta).

The problem I proposed with filling rooms in a dungeon with monsters could be solved with code as simple as the following:

# A dungeon has ten rooms, with up to 50 monsters in each.
# We want most rooms to have a large number of monsters in
# them, or be almost empty.
r.seed(107)
monsters_in_rooms = list()
for i in xrange(0, 10):
	monsters_in_rooms.append(int(r.betavariate(0.1, 0.1) * 50))
print monsters_in_rooms
## [0, 49, 45, 49, 5, 49, 0, 48, 0, 25]

Notice that since betavariate() returns values between 0 and 1, I simply multiplied by the number I actually wanted. The parameters are alpha = beta = 0.1.

To get more monsters with 4000-4250 health than 4250-5000 health, try:

# Monsters have an average of 4500 health. With alpha = 1
# and beta = 3, the betavariate() will return values with
# a mean of 0.25. So we multiply by 1000, and add 4000.
r.seed(105)
health_list = list()
for i in xrange(0, 15):
	health_list.append(int(r.betavariate(1, 3) * 1000) + 4000)
print health
## [4054, 4023, 4430, 4172, 4415, 4253, 4045, 4135, 4056, 4035, 4005, 4172, 4072, 4296, 4395]
print sum(health) / 15
## 4170

If you wanted values closer to 4250 itself, simply increase alpha and beta without changing their ratio.

And finally, to get something like the normal distribution solution we had earlier, try:

# Monsters have an average of 4500 health, with a range of
# 4000-5000. So we multiply by 1000, and add 4000.
r.seed(115)
health_list = list()
for i in xrange(0, 10):
	health_list.append(int(r.betavariate(3, 3) * 1000) + 4000)
print health_list
## [4194, 4571, 4162, 4334, 4564, 4551, 4324, 4363, 4450, 4193]
print sum(health_list) / 15
## 4370

In a future post I will show how to use the Dirichlet distribution to solve another kind problem that shows up in procedural content generation.