German Tanks, Wood Sticks, and Loaded Dice

Nadim Kawwa

Published in

DataDrivenInvestor

6 min readJun 28, 2019

War, what is it good for? Absolutely nothing.

Statistics, what are they good for? Absolutely everything.

Which is More Promising: Data Science or Software Engineering? - Data Driven Investor

About a month back, while I was sitting at a café and working on developing a website for a client, I found this woman…

www.datadriveninvestor.com

In this post, we will explore how data scientists use a method called bootstrapping to hack their way into statistics. The method is popular since it can deliver quick results that are close to an analytical solution.

In this post we will go over three examples, for each we will:

State the problem
Set up the experiment
Implement in python code
Validate result analytically

Let’s begin with our first problem: Tanks!

German Tanks

The German Tank Problem is used for estimation and stems from a real problem faced by the Allied Forces during World War II. This variation of the problem is from a post on fivethirtyeight.com

Problem Statement

You are a British spy trying to record how many tanks the Germans have. You know that every new tank produced is given a serial number, with the smallest number being 1. So the first tank built has a serial number of 1, the second 2, and so on…

You jot down the serial numbers that you spotted. On the way back you get ambushed and lose the information. All you remember is that the smallest number is 22 and the largest is 114.

How many tanks do the Germans have?

Experiment Setup

Here we know little but we do not for sure how far away we are from the true minimum, here 1. What happens if our spy saw the tanks many times and then got ambushed at each occasion?

Solution in Python Code

Implementing it in code is as follow

Here you saw that we assumed our spy snapped 10 serial numbers. The histogram distribution is, therefore:

What happens if we snapped 20 serials?

We can see that with a bigger sample size we get closer and closer to the edge, this is the key takeaway from the experiment. In both cases, we can infer some kind of symmetry.

This leads us to say that we are as far away from the minimum as we are from the maximum. The number of tanks that the Germans have is:

Analytical Solution

The wikipedia entry lists more than one way to solve this problem. For example, a frequentist approach would yield a result in the same ballpark:

Wood sticks

Problem Statement

We have a stick made of wood of some random length L. We pick two random points along the length and cut the stick at those points. We now have 3 smaller sticks.

What’s the probability we can form a triangle from those 3 sticks?

Experiment Setup

What makes a triangle? If we remember geometry class, we can draw upon the necessary conditions for a triangle to exist. Among those, we can make use of the triangle inequality.

Given a triangle of sides a, b, and c the following inequalities must hold:

a + b <c
b + c <a
a + c <b

So if we pick two points on our stick, those three conditions are sufficient and necessary to see if a triangle can take shape.

Solution in Python Code

In reality, the length of the stick does not matter, when speaking of individual stick lengths we refer to the length as a fraction of total length. We also know that there is a uniform chance when it comes to picking points.

Run the experiment below several times as follows:

Pick two random points from a uniform distribution
Get the length of each portion
Check for triangle inequality
Record successes

Run the code above we get about 0.25.

Analytical Solution

There are several methods of solving this analytically and we won’t go in the details here, although we can solve it using a lot of integrals as seen here. We note that all solutions return a probability of 0.25, matching our experimental results.

Loaded Dice

Problem Statement

A bag contains 9 dice, 8 are fair and one is loaded. A loaded dice will always return a 6, a fair dice returns values from 1 to 6.

We draw a random dice from the bag, roll it twice and get a 6 on both rolls.

What’s the chance we picked the loaded die?

Experiment Setup

Let’s assume our dice are numbered from 1 to 10, the tenth one is the loaded dice.

Pick a random number between 1 and 10
If it’s a ten, record it as a cheat success
If it’s anything else, pick two integers between 1 and 6, and record a fair success if it’s two sixes