The drunkard’s dartboard: an intuitive explanation of Monte Carlo methods
(Note to the reader: An Excel sheet showing sample calculations and plots discussed in this post can be downloaded here.)
Monte Carlo simulation techniques have been applied to areas ranging from physics to project management. In earlier posts, I discussed how these methods can be used to simulate project task durations (see this post and this one for example). In those articles, I described simulation procedures in enough detail for readers to be able to reproduce the calculations for themselves. However, as my friend Paul Culmsee mentioned, the mathematics tends to obscure the rationale behind the technique. Indeed, at first sight it seems somewhat paradoxical that one can get accurate answers via random numbers. In this post, I illustrate the basic idea behind Monte Carlo methods through an example that involves nothing more complicated than squares and circles. To begin with, however, I’ll start with something even simpler – a drunken darts player.
Consider a sozzled soul who is throwing darts at a board situated some distance from him. To keep things simple, we’ll assume the following:
- The board is modeled by the circle shown in Figure 1, and our souse scores a point if the dart falls within the circle.
- The dart board is inscribed in a square with sides 1 unit long as shown in the figure, and we’ll assume for simplicity that the dart always falls somewhere within the square (our protagonist is not that smashed).
- Given his state, our hero’s aim is not quite what it should be - his darts fall anywhere within the square with equal probability. (Note added on 01 March 2011: See the comment by George Gkotsis below for a critique of this assumption)
We can simulate the results of our protagonist’s unsteady attempts by generating two sets of uniformly distributed random numbers lying between 0 and 1 (This is easily done in Excel using the rand() function). The pairs of random numbers thus generated – one from each set – can be treated as the (x,y) coordinates of the dart for a particular throw. The result of 1000 pairs of random numbers thus generated (representing the drunkard’s dart throwing attempts) is shown in Figure 2 (For those interested in seeing the details, an Excel sheet showing the calculations for 100o trials can be downloaded here).
A trial results in a “hit” if it lies within the circle. That is, if it satisfies the following equation:
(Note: if we replace “<” by “=” in the above expression, we get the equation for a circle of radius 0.5 units, centered at x=0.5 and y=0.5.)
Now, according to the frequency interpretation of probability, the probability of the plastered player scoring a point is approximated by the ratio of the number of hits in the circle to the total number of attempts. In this case, I get an average of 790/1000 which is 0.79 (generated from 10 sets of 1000 trials each). Your result will be different from because you will generate different sets of random numbers from the ones I did. However, it should be reasonably close to my result.
Further, the frequency interpretation of probability tells us that the approximation becomes more accurate as the number of trials increases. To see why this is so, let’s increase the number of trials and plot the results. I carried out simulations for 2000, 4000, 8000 and 16000 trials. The results of these simulations are shown in Figures 3 through 6.
Since a dart is equally likely to end up anywhere within the square, the exact probability of a hit is simply the area of the dartboard (i.e. the circle) divided by the entire area over which the dart can land. In this case, since the area of the enclosure (where the dart must fall) is 1 square unit, the area of the dartboard is actually equal to the probability. This is easily seen by calculating the area of the circle using the standard formula where is the radius of the circle (0.5 units in this case). This yields 0.785398 sq units, which is reasonably close to the number that we got for the 1000 trial case. In the 16000 trial case, I get a number that’s closer to the exact result: an average of 0.7860 from 10 sets of 16000 trials.
As we see from Figure 6, in the 16000 trial case, the entire square is peppered with closely-spaced “dart marks” – so much so, that it looks as though the square is a uniform blue. Hence, it seems intuitively clear that as we increase, the number of throws, we should get a better approximation of the area and, hence, the probability.
There are a couple of points worth mentioning here. First, in principle this technique can be used to calculate areas of figures of any shape. However, the more irregular the figure, the worse the approximation – simply because it becomes less likely that the entire figure will be sampled correctly by “dart throws.” Second, the reader may have noted that although the 16000 trial case gives a good enough result for the area of the circle, it isn’t particularly accurate considering the large number of trials. Indeed, it is known that the “dart approximation” is not a very good way of calculating areas – see this note for more on this point.
Finally, let’s look at connection between the general approach used in Monte Carlo techniques and the example discussed above (I use the steps described in the Wikipedia article on Monte Carlo methods as representative of the general approach):
- Define a domain of possible inputs – in our case the domain of inputs is defined by the enclosing square of side 1 unit.
- Generate inputs randomly from the domain using a certain specified probability distribution – in our example the probability distribution is a pair of independent, uniformly distributed random numbers lying between 0 and 1.
- Perform a computation using the inputs – this is the calculation that determines whether or not a particular trial is a hit or not (i.e. if the x,y coordinates obey inequality (1) it is a hit, else it’s a miss)
- Aggregate the results of the individual computations into the final result – This corresponds to the calculation of the probability (or equivalently, the area of the circle) by aggregating the number of hits for each set of trials.
To summarise: Monte Carlo algorithms generate random variables (such as probability) according to pre-specified distributions. In most practical applications one will use more efficient techniques to sample the distribution (rather than the naïve method I’ve used here.) However, the basic idea is as simple as playing drunkard’s darts.
Thanks go out to Vlado Bokan for helpful conversations while this post was being written and to Paul Culmsee for getting me thinking about a simple way to explain Monte Carlo methods.