Sherlock Holmes is as good as 48% dead when his train pulls out from Victoria Station
Disclaimer: This post is not about ML or computer security, but I hope you’ll enjoy it anyway!
In my spare moments (OK mostly whilst riding the tube to work with no cellular connectivity…) I’ve been reading Prisoner’s Dilemma, a biography of von Neumann by William Poundstone [1].
Early in the book I was surprised by the application of Game Theory to a classic Sherlock Holmes story: The Final Problem. Indeed von Neumann (and Oskar Morgenstern) were evidently fans of Doyle and provided this example themselves in Theory of Games and Economic Behavior, 1944 [2]. Neumann and Morgenstern write:
Sherlock Holmes is as good as 48% dead when his train pulls out from Victoria Station.
I was very intrigued by this number and wanted to understand how it was derived. Read on if you would like to know more!
In The Final Problem, Holmes’ arch nemesis Professor Morriarty is out to kill the detective who has been on his case for several months. After three attempts on his life are made in a single day, Holmes decides he must flee London and arranges to meet Watson at Victoria Station. Just as the train begins its departure, Morriarty is spotted furiously pushing his way through the crowded platform and waving his hand clearly desiring to stop the train. Holmes’ predicament in this moment, on which his life now depends, is that the train may only be alighted in one of two places:
- The original destination Dover.
- The only intervening stop Canterbury.
If Holmes chooses wrongly he will surely perish. How should he proceed?
Homes' Strategy | |||
---|---|---|---|
Dover | Canterbury | ||
Morriarty's Strategy |
Dover | 100 | 0 |
Canterbury | -50 | 100 |
Using game theory, Holmes’ choice can be modelled as two-player non-zero-sum game (a Prisoner’s Dilemma) with the payoffs shown in Table 1. Perhaps you are wondering where these numbers came from? Neumann and Morgenstern reason that the greatest payoff (for Moriarty) should occur if he successfully pre-empts Holmes and meets him at his detrainment where he will surely be killed. If Holmes should reach Canterbury whilst Moriarty guesses Dover then there is a “tie” of sorts, Holmes has not reached the continent but neither has Moriarty succeeded in finding the detective. Finally, if Holmes disembarks at Dover and Moriarty guesses incorrectly then Holmes will certainly reach the continent. In this final scenario Moriarty has clearly lost but the pay-off (loss) is less than a victory since he is unlikely to be killed by his opponent.
An important point here is that whilst these pay-off values might appear reasonable, they are also entirely subjective. Game theory only talks about utility, and in cases where the pay-off can’t be counted (e.g., in years or £), the assessment of utility is inherently subjective. Poundstone writes:
Game theory is a kaleidoscope that can only reflect the value systems of those who apply it.
So back to our question, which station should holmes choose to maximise his chances of survival and why is there a 48% chance this is his final train journey?
The first question is answered by our pay-off table. Firstly you must know that game theory assumes players are “rational actors” that will always seek to maximise their expected utility in any given game. In zero-sum two-player games this gives rise to one or more “saddle points” which provide a “pure strategy” yielding the maximum pay-off for both players. A quick example is given in Table 2 below, the Cutter should always cut the cake as evenly as possible and the Chooser should always choose the bigger piece. The pay-off, half of the cake minus a crumb for the Cutter, is both the minimum in its row and the maximum in its column. Any alternative choice will always result in an outsized loss.
Chooser's Strategy | |||
---|---|---|---|
Choose biggest piece | Choose smallest piece | ||
Cutter's Strategy |
Cut evenly | Half the cake minus a crumb | Half the cake plus a crumb |
Cut unevenly | Small piece | Big piece |
Returning to Table 1 we can see that there are no saddle points. The minimums of both rows (Morriarty’s minimum pay-offs), -50 and 0, are diagonally separated from the maximums of both columns (Morriarty’s best pay-offs), 100. Neumann and Morgenstern show that “separation of the diagonals is necessary and sufficient” for a game being non-strictly determined. In other words, a mixed strategy is required to maximise the expected utility of each player when the game is played multiple times. This is demonstrated in Table 3 where, for the random strategy (not optimal here), the average pay-off is better than any pure strategy. Morriarty really should travel to Dover as this maximises his minimum pay-off (Holmes escapes but doesn’t reach the continent). Holmes cannot minimise the maximum advantage of his opponent as either choice could result in his death, yet he knows that Morriarty is most likely to go to Dover and guarantee he has not left England. If either player entirely eliminates chance from their strategy then it is sure to be exploited by their opponent.
Homes' Strategy | ||||
---|---|---|---|---|
Dover | Canterbury | Random | ||
Morriarty's Strategy |
Dover | 100 | 0 | 50 |
Canterbury | -50 | 100 | 25 | |
Random | 25 | 50 | 37.5 |
How should an optimal mixed strategy be calculated? In short, Neumann and Morgenstern give the required formulae on page 172 of Theory of Games and Economic Behavior.
In other words, the optimal Nash equilibrium strategy of this pay-off table, and the game theoretic solution to Holmes’s problem, is that he should travel to Canterbury with a 60% probability meanwhile Morriarty should travel to Dover with a 60% probability. Holmes’ final odds are hence calculated:
\[\begin{aligned} P[\text{Holmes Dies}] = P[\text{Dover}|\text{Morriarty}]\times P[\text{Dover}|\text{Holmes}]&\\ + P[\text{Canterbury}|\text{Morriarty}]\times P[\text{Canterbury}|\text{Holmes}]&\\ = \frac{3}{5}\times\frac{2}{5} + \frac{2}{5}\times\frac{3}{5} = 0.48&\\ \end{aligned}\]Our final pay-off table including the optimal mixed strategy is shown in Table 4. As hoped, the expected pay-off is greater than the random strategy.
Homes' Strategy | ||||
---|---|---|---|---|
Dover | Canterbury | 40/60 | ||
Morriarty's Strategy |
Dover | 100 | 0 | 40 |
Canterbury | -50 | 100 | 40 | |
60/40 | 40 | 40 | 40 |
To wrap up, we’ve taken a look at some game theory basics including Nash equilibria for both pure and mixed strategy games. I was basically new to game theory when I started reading Von Neumann’s biography and I learned a lot by trying to determine how my favorite detective’s survival odds were calculated. The derivation of the formulae used to calculate the strategy is something I haven’t quite gotten to the bottom of yet, but I’m sure the answer is in Theory of Games and Economic Behavior.
Whilst the fate of Sherlock Holmes drew me in initially, I also began to wonder whether game theory could benefit autonomous cyber operations. For example, I’m troubled by the lack of theoretically motivated reward bounds in the CAGE challenges. In all three of the challenges published to date, zero is the best possible score but it is rarely achievable. Our winning submission to the first challenge averaged, over 1000 episodes, -5.85 and -11.01 in 100 steps of CybORG 1.2 versus the \(\texttt{B-line}\) and \(\texttt{Meander}\) adversaries respectively. I’d love to understand how close that is to the expected reward of an optimal strategy. Similarly it would be really helpful to understand, and perhaps even quantify, how much pay-off is sacrificed by the partial observability limitations of the CybORG environment.
Bibliography
- Prisoner’s Dilemma: John Von Neumann, Game Theory and the Puzzle of the Bomb. Poundstone, W., 1992. Random House.
- Theory of Games and Economic Behavior. Neumann, J.V. and Morgenstern, O., 1944. Princeton University Press.