This is only the toy version of the actual problems solved by the Allies, which were more nuanced, and involved reasoning about the tank manufacturing pipeline. The write-up [0] doesn't go into the math but makes an interesting read.
Yeah, I can't imagine the assumption that tanks captured were "randomly uniformly distributed" is a good one. I can imagine all sorts of reasons that wouldn't be the case.
There were the infamous raids on the ball bearing factories in Schweinfurt. (At the time the allies didn't have escort fighters with sufficient range, and the bombers suffered heavily.)
But AFAIK those targets were selected based on pre-war "traditional" intelligence what the likely bottleneck resources would be, not statistical analysis of captured equipment.
Except that the data on the number of tanks, thier increased weight and number of wheels, pointed to a likely increase in the need for bearings. (And all the other wheeled non-tank things too.)
According to conventional Allied intelligence estimates, the Germans
were producing around 1,400 tanks a month between June 1940 and September 1942.
Applying the formula below to the serial numbers of captured tanks, the number
was calculated to be 246 a month. After the war, captured German production
figures from the ministry of Albert Speer showed the actual number to be 245.
I was actually surprised that MVUEs and the fellow point estimators are called frequentist (though it makes sense). In school we always referred to them as non-Bayesian, at the same time frequentist always seemed like a dirty word to us students so maybe that's why
How ironic that the nation that led the world in the frontiers of maths in the 19th century completely missed the boat in the applied math of signals intelligence in WWII. I'm referring to the tank serial numbers and the lack of care in Enigma codes, except by the Kriegsmarine, but even they eventually lost a code book to the allies, which they apparently considered an impossibility.
They had a lot of opsec problems. There's a great story about how using "cool" codenames instead of random ones bit them in the ass.
It's nearly impossible for a bomber to navigate long distances in the dark over a blacked-out country, so the Germans came up with a radio navigation system involving beams transmitted from the mainland to intersect over the target, which the British figured out how to jam; the Germans came up with another nav system, and the Brits eventually jammed that one too.
The British knew the Germans would be trying to find yet another way. They'd learned from Enigma decrypts about a new device called Wotan. One researcher looked up the word, learned that it was the name of a one-eyed god, and concluded that the new system would use a single transmitter with a rangefinding transponder aboard the bomber, instead of multiple beams like the previous ones. Starting from there, they had a countermeasure online and ready to go before the Germans even deployed Wotan. When the Nazis realized they'd been outmaneuvered from the start, they gave up on radio-guided bombing completely, at least against Britain.
To be fair, some of the German spies were pretty bad at their jobs. Josef Jakobs stands out as a man who was just not a good spy: https://en.wikipedia.org/wiki/Josef_Jakobs
Dr. R.V. Jones had significant involvement in the War of the Beams, and after the war wrote a book about British Scientific Intelligence efforts during the war.
Slightly off topic, but that mindset, ignoring expertise in field that could help in another, is still quite common in Germany if you ask me.
And yes, the military intelligence of the Germans sucked in WW2. Didn't help neither that the culture, military and political, was highly idiological. When truth cannot be spoken and power won't listen facts are ignored. It cannot be what's not allowed to be. And then reality bites your ass ultimately.
True. Still, from what I know, he saw himself as a patriot. Which was the reason why he opposed the Nazis but the reason why he didn't defect or betray the Germans.
If by "the Germans" you mean the German government, he did betray them, at least when it to war crimes and the Holocaust. I don't know if he also sabotaged the war but that would also be consistent with patriotism. A patriot loves his country, that doesn't mean they don't care about others at all. That would be some sort of combination of patriotism and psychopathy.
A patriot could also have decide d to sabotage the war effort to end the war faster, to get rid of Hitler or to somewhat save the reputation of Germany.
He didn't openly defect but that doesn't mean much for a spy.
From what I remember of the documentation I saw about him back the day he never actively sabotaged the war effort or did things that sis put German troops in danger. He opposed the Nazi regime.
Yes, I agree it is quite a feat of mental acrobatics. My impression was that he somehow seperated Nazis and the German nation. And that the war was a German and not really a Nazi thing. Maybe he just didn't want to see that Germany and the Nazis were the same thing at the time, maybe he also wanted a round two after WW1 or maybe he wasn't able to shake decades of upbringing and training.
Either way, he was one of the few "good" Germans, even if not on Schindler levels, and definitely a very interesting person. Just look up his WW1 adventures.
Notable, so, is that even in WW1 and after he was not necessarily a trained spy intel guy, AFAIK.
I didn't read much about Canaris but I don't think his stance required much mental acrobatics. Considering a nation and its government somewhat separate entities is quite normal.[1] Of course many Germans used this as an excuse after the war but this does not apply to Canaris.
Considering wars of aggression acceptable wasn't all that unusual either.
Sabotaging the war effort would have meant helping the Allies fight Germany. Sabotaging war crimes and the Holocaust meant trying to stop Germany from something evil and stupid (at least if he considered German Jews German). While there were reasons for a patriot to sabotage the war effort, only sabotaging the crimes was also a consistent position.
[1] Especially when it isn't democratically elected. The last multi-party elections in 1933 weren't free. The communist Reichstag members were jailed, many others were intimidated to make them support the enabling act.
Summing it up pretty well. And yes, that is how I understood Canaris. And yes, from his perspective it seems a logical stance to take. Hindsight makes a lot of things easier, doesn't it? Also true that he saw what the Nazis really were and did something about itt. A rare feat during these days.
If you were a Nazi codebreaker whose successes in the war were classified, would you publish detailed memoirs? Or would you destroy the evidence, which was probably what your orders said to do anyway?
In 1933, the Nazi regime passed a law[1] banning anyone they considered Jewish from holding any civil service job, including positions at universities. A large proportion of German academia was considered Jewish.
Interesting article, though I think it incorrectly leaves
the reader thinking that there is some interesting
informating hidden in the average spacing of the numbers.
In fact, all you need to know is that maximum observation
and the number of observations. Once you simplify the average
spacing goes away.
If M is the maximum serial number of N is the total number of
observations, using the formula in the post:
M + (avg. spacing) = M + M / N - 1 = (N + 1) / N * M
To me that gives a more clear picture of what the unbiased
estimator is doing: inflate the maximum value by a factor that
limits towards one as the sample size grows.
If you just assume that the sample mean = the population mean, then you get the right answer, at least for this example. I don't see why the article fools around with the maximum at all - isn't the maximum a much more noisy statistic than the mean?
The range matters – had they found 10 serial numbers between 100000 and 101000, would the mean still be a meaningful estimate of the production rate? In this case, the author just tacitly assumes the minimum to be zero.
To be the devils advocate: what you say is true if you know the distribution. If spacing looks weird (e.g. clustered) it might indicate that the number is, for example a pairing of model and serial numbers, etc.
For anyone else interested in WW2 reverse engineering and design etc., https://www.youtube.com/watch?v=GJCF-Ufapu8
"The secret war" is a huge documentary covering british efforts to counter german electronic warfare and V-weapons.
Why didn't they use randomized and scrambled serial numbers? Sort of like what Amazon does to their order numbers. I know it can still be cracked but serially numbering military equipment is not very smart. I was setting up a Shopify store the other day and it doesn't allow for a lookup table to be used for order numbers. I don't want competitors to know that I've sold so many X items. Same thing with Squarespace and Square e-commerce stores. It blows my mind that a multi-billion dollar ecom giant has not implemented despite of forum posts and requests from users.
World War II was (at least one of) the first industrialized war. So the whole situation was genuinely novel to most participants.
Additionally, the German army command didn't think that way. Where the US relied on overpowering by materiel dominance, and the Soviets fought and won through unimaginable human sacrifice, the considerable initial success of the German army was based on better, smarter tactics, individual leadership, bravery, ruthlessness, etc. The leadership assumed they'd be able to win the war that way, even when the war had turned into a much more industrial operation.
You can see that in operations such as the Battle of the Bulge, the war in Normandy, and most importantly in the the Russian campaign.
This is of course over-generalizing, but I believe the general mode of thinking was there, and that'd explain the lack of attention on such details.
Don't underestimate Soviet industrial capabilities during the war. The Soviets produced over 58K T-34 type tanks compared to Germany producing 37K (PzIII through Pz6).
You're correct, the Soviets outproduced Germany as well as being willing to run much higher losses. E.g. in the Battle of Kursk, the Soviets outnumbered the Germans by x2 in just about everything (tanks, planes, men). They won, but lost x2-x4 in tanks, planes, men.
In either case, terrible times that we should be thankful not to have been born into.
At the point of German-Soviet conflict, the US had about six times the steel production of the USSR, six times the iron production, eight times the oil production, and three to four times the coal production. It definitely wouldn't be surprising if US steel was a large share of Soviet figures.
The scale of resources delivered to prop up the Soviets was extraordinary, including what the British sent them.
In just 3 1/2 years the British sent them[1]:
3,000+ Hurricanes aircraft, 4,000+ other aircraft, 27 naval vessels, 5,218 tanks, 5,000+ anti-tank guns, 4,020 ambulances and trucks, 323 machinery trucks, 1,212 Universal Carriers and Loyd Carriers, 1,721 motorcycles, £1.15bn worth of aircraft engines, 1,474 radar sets, 4,338 radio sets, 600 naval radar and sonar sets
And the US sent them:
427,284 trucks, 13,303 combat vehicles, 35,170 motorcycles, 2,328 ordnance service vehicles, 2,670,371 tons of petroleum products (gasoline and oil) or 57.8 percent of the High-octane aviation fuel,[32] 4,478,116 tons of foodstuffs (canned meats, sugar, flour, salt, etc.), 1,911 steam locomotives, 66 Diesel locomotives, 9,920 flat cars, 1,000 dump cars, 120 tank cars, and 35 heavy machinery cars. Provided ordnance goods (ammunition, artillery shells, mines, assorted explosives) amounted to 53 percent of total domestic production
Beyond Russia also notes:
"The USSR received a total of 44,000 American jeeps, 375,883 cargo trucks, 8,071 tractors and 12,700 tanks. Additionally, 1,541,590 blankets, 331,066 liters of alcohol, 15,417,000 pairs of army boots, 106,893 tons of cotton, 2,670,000 tons of petroleum products and 4,478,000 tons of food supplies"
The notion of sending a country 375,000 trucks and 1,900 locomotives in just three years, is incredible to think of today.
While the Soviet Union lost a huge number of people, the 25-30 million dead is an inflated estimate because it counts people living in areas that were conquered by the Soviet Union. Polish people by and large do not appreciate being lumped in with the people who had invaded them the year before.
First, they probably didn't consider that serial numbers might be an information leak.
Second, all calculations were done by hand in those days (and documents that weren't printed in bulk had tp be retyped by hand) so sequential numbers were not only easier to issue but to track (e.g. if you have a production problem you can say "let's check all tanks with S/Ns between A and B" rather than having to maintain a list mapping production dates to serial numbers that might be in a file cabinet somewhere distant from where you are.
> Why didn't they use randomized and scrambled serial numbers?
Because there weren't well-known examples of the risk of not doing that, and not doing it is the easy and obvious thing if you have no clear reason to do it, and makes lots of things you might use those numbers for yourself easier (and if it wasn't for your own use, you wouldn't issue the numbers at all.)
Because supply chain logistics. The Germans in WWII were world leaders in manufacturing (perhaps bested only by the US), and one of the elements of that manufacturing quality is the ability to trace individual parts back to the exact manufacturing batch to figure out why particular batches go wrong.
The Germans did (eventually) make some effort to obscure details of their supply chain--they forced manufacturers to use three-letter codes instead of their normal trademarks--but that still suffered from poor operational security which allowed the codes to be quickly matched up to manufacturers. It didn't help that the British analysts meticulously kept track of everything, allowing them to identify the manufacturer of one unlabelled part by the inspector's number.
They might have had good equipment sometimes, but they had nowhere near enough of it. They were outproduced by Britain “alone” (counting the colonies) in most areas, most the time, and often by considerable margins.
The German army was not particularly mechanised or well equipped as a whole, relying on a lot of horse draw vehicles for the entire war.
When you look at the war from a manufacturing perspective, the question is more about how Germany survived for so long again it’s such huge manufacturing nations.
For a seemingly dry subject, David Edgerton’s book on this is very readable. https://www.theguardian.com/books/2011/mar/27/britains-war-m...
I should say that Germany was a leader in manufacturing quality, not so much quantity, although I believe they did comparatively well there considering that they were facing down the gargantuan industrialized economies of the US, UK, and USSR.
It's also worth point out that Germany suffered from a severe lack of resources, particularly oil and rubber (although everyone in WWII was short on rubber). While they did have synthetic fuel and rubber plants that they made excellent use of (part of the reason for German superiority in the chemical industry was their need for it), these synthetic routes are not really sustainable for a massive war effort, and Germany ran out of their stockpiled reserves by 1942. Case Blue, the second offensive in the USSR, had obtaining the Baku oil fields as its main objective.
Even quality I'm not so sure. Because quality largely depends on the intended use. And if that use is to shoot at things an be shot at in return in abysmal operating conditions the traditional German quality standard is just over the top and unsuited. And while quality beets quantity on a per unit basis, globally there is inly so much quantity difference quality can make up. Something Germans seem incapable of culturally understanding even today the concept of "good enough" is beyond the understanding for a lot of my fellow country men.
German quality was largely a myth. Examine tanks; they need to be survivable, reliable, and potent. Without all three, they're useless. German tanks were rarely the best on the battlefield when measured this way.
Aircraft (especially fighters) have the same three requirements: until the ME-262 was deployed, Germany was only on par with the allies.
Artillery? Other than the feared 88mm, its artillery was clearly second fiddle to the Allies.
What enabled Germany to have any success was the initial training of its NCOs and officer corp. This allowed them to exploit opportunities faster than their opponents (think of Boyd's OODA cycle).
But all the oft-touted German "super-weapons" were usually over-engineered stuff that didn't work reliably. Note that the ME-109 flew until the end of the war since it was reliable, and effective against bombers until they were escorted by Mustangs and Thunderbolts.
It wasn't about lack of industrial know how, it was the policy of Autarky. The Nazi regime literally starved itself of essential manufacturing and war time resources.
They knew for example that they only had enough oil, with the limited mechanized forces they had, for operational effectiveness until autumn 1941. After that Germany would never again have the resources for grand operations on the strategic level of operation barbarossa. They needed to get to the oil fields of the Caucasus region which they did not even get close to due to some screwed up leadership decisions.
Fall blau was a pale comparison to the earlier operations and Germany's logistics system and resources were beyond tipping point.
And then when it came to Kursk all they could really manage was a single limited scope battle.
They did, sort of, starting in the 1940s. The tank make/model were replaced by arbitrary codes, but it was done sloppily and many of the tanks could be re-identified. (See page 80 of the paper @dooglius posted above).
Still, the idea that this could leak valuable information is probably more obvious in hindsight, and sequential serial numbers do have some upsides. If there's a design flaw in one version of the gearboxes, you can just pull everything with a serial number between XXXX and YYYY. With randomized numbers, you'd have to maintain some master database, which is a lot harder when most of logging is done with pen-and-ink ledgers, carbon copies, and maybe punchcards.
Naively, it would inflate the absolute values, but you might be able to correct for it in various ways. Still, knowing that tank production fell 50% is probably useful, even if you don't know what exactly it fell from....
It's interesting to see other commenters saying that manipulating IDs wouldn't occur to the Germans, it reminded me of an interesting anecdote from Hitler's rise to power: "It was in January 1920 when a numeration was issued for the first time and listed in alphabetical order Hitler received the number 555. In reality, he had been the 55th member, but the counting started at the number 501 in order to make the party appear larger." [0]
I think it's more likely that many were aware of the security issues, but it wasn't worth the coordination of coming up with a scheme, giving it to all spare parts suppliers in a secure way, etc. potentially slowing down the war effort. I bet the Allies used a lot of serial numbers too, despite this work.
>Why didn't they use randomized and scrambled serial numbers?
Because it happened 80 years ago, when German army (or any other) did not understand statistics as well as they do today.
It was a groundbreaking achievement by allies.
They did know about encryption and developed the Enigma machine.
I don't think you need deep statistics knowledge to know that if the enemy captured Serial # 0020, 0120, 0439, 1293 and 1356; they would at least have some hint that the lower bound is 1356 tanks.
Sure, but as the article points out, that's not a great bound—and you really care about the expected number of tanks (or the upper bound), rather than the lower. The Allies had drastically overestimated this, by about fivefold.
Furthermore, the interesting part isn't just the number of tanks or planes—though that has obvious strategic uses too— but the insight it gives you into their industrial production. What's the limiting factor in getting a tank to the front--machining the parts? assembly? fuel? Which of our raids affected that?
The job interview version: If you are being interviewed for a position by engineers who have their employee ids (serially allocated) on their badge find the number of employees from those ids assuming all engineers are equally likely to be on the panel of 8.
I have looked at my payroll check numbers from contracting firms to see how they're doing as a business. If the interval between check numbers drops in a month, I have a good idea that there aren't as many people working there anymore.
I worked for a big company and we used to put bug numbers in our change releases. We were told to stop doing this, as some customers would see that their bug would appear to have been given a lower priority when they saw lower numbers coming in first.
when i first started a grey-market research chemical company some years ago, i added like 31 or something to each invoice number to make it seem like i did more business.
The excellent book "Paradoxes in Probability Theory" by William Eckhardt has some good arguments against this, the Simulation Hypothesis, and similar things. One simple way to summarize the counterargument is that in a lot of cases, the choice of seemingly-reasonable priors actually hides an unreasonable assumption that it is possible for the future to change the past.
Take a silly premise, get a silly argument and a really silly conclusion.
Confounding question: 1000 years ago, would this argument look any different? Answer: mathematically speaking, it would not. In fact, far more humans have been born than you could have predicted using this method. Conclusion: the argument is flawed.
The people living a 1000 years ago indeed could have used the same argument to show that they should be 95% certain to be in the last 95% of all humans to be ever born, and history indeed showed that this was not the case; the dice fell on the other 5% possibility for them and the population increased more than 20x.
However, the argument will still give the correct prediction for most humans that try to use it. Just not for the few that were in the special position to be born early in the sequence of all humans. The argument essentially tells you that you have no reason to believe that you are also in that special position.
You are trying to predict the length of the sequence of all humans, so you have no way to know whether or not you are ‘early in the sequence’. You can’t use the number you are trying to predict as an input to your prediction.
The tank problem doesn’t tell you how many tanks Germany will go on to build - it just tells you how many they have already built.
‘Good news! The war is almost over! The chances are these tank serial numbers all fall among the last 95% of the tanks Germany will ever produce!’
The difference is that the german tank sample can only be drawn from the pool of existing tanks so far. But when you consider your own birth, that's a sample out of the pool of all the humans, including future ones. Because future humans can also ponder their own place in the sequence, and where they find themselves is distributed uniformly over the sequence of all humans.
To apply the doomsday argument to tanks, we need to fix the sampling. It won't work for allies grabbing german tank samples. We have to think from the point of view of the tank; say a copy of your consciousness is being uploaded to an army of tanks. If you wake up as tank #734, could there eventually be millions of tanks? Maybe so, but there's only a 5% chance that you are one of the first 5% tanks, so there's a very good chance that there will be less than 734*20 tanks in total.
This all seems to assume the tank serial numbers would be captured at one moment in time ("captured 15 of these tanks uniformly at random.") But in fact the tank shells dribble in over time which biases the gap, the gaps at the highest numbers are going to be greater. Earlier tanks have had many more chances to be destroyed or captured. So using average gap is clearly not going to give the best estimate. If you restrict yourself to tanks from the latest large battle, that will cancel out the dribble effect though.
Similarly, Google+ userIDs were assigned as 21-character numeric strings, beginning with '10' or '11', but otherwise appearing to be randomly assigened.
A full listing was available through the site's robots.txt sitemaps file, or rather, a listing to the listing of 50,000 user profile sitemap files, with about 44k profiles per file. This worked out to 25 GB of profile listings alone.
Rather than download the full set (though I eventually did), I picked an arbitrary file from near the middle of the listing, and ran some spot checks on the profiles, which seemed to be reasonably randomly distributed by age, location, and other characteristics. With as few as 100 profile page downloads, it was clearly evident that active posting to G+ was limited to about 8-11% ofall profiles. The full 50k profile sample, and a third party's independent (and more robustly randomised) 500k profile sample eventually showed this to be 9.7%.
(And yes, if I was being more rigorous I could have done much more testing or work, but I was mostly addressing personal curiosity and an online disagreement with someone.)
An interesting proof of the power of random sampling.
Larger samples do allow for clearer views of rare phenomena -- such as dialing in on the fraction of 1% of G+ users highly active on the site. Or when I later looked at Communities characteristics, the properties of the very largest (about 50 > 1 million members) of the 8 million total. In that case, I eventually got access (also via a third-party) to a comprehensive summary dataset.
The userID hashing also made approaches such as exhaustively searching the ID space for user pages nonviable. The search space was trillions pf times larger than the target space.
There's a zillion things you can estimate this way. A lot of sites use sequential cookies, user IDs, etc. Until about a decade ago UPS tracking numbers were sequential for each shipper which made it trivial to estimate output for online shops. Apple invoice numbers used to be dense and sequential and you only needed the number to retrieve the invoice. The IMEI is actually just about the worst way to have estimated iPhone sales in 2008; at that time you could literally have crawled Apple's website for every invoice whether sold online or in stores.
This is also a good (applied, with simple code) example of the use of probabilistic programming. I can't get myself to read full books, but somehow this simple example gave me some intuition and additional pointers to follow.
It depends on how you intend to 'score' the estimate.
Are you looking for the answer that is the 'most likely', or one that has the 'lowest least squared error', or maybe one that is 'unbiased' (mean error)?
I remember studying this problem in the context of anonymity a few years back, defining immeasurability as the property whereby an adversary cannot distinguish between different node counts, for example. The tank problem is related to mark recapture techniques for animal population size estimation. Shameless plug,
Little bit of a funny though: Note how num_tanks ~ Unif(max(captured),2000) was defined, so you already have p[ parameter | data ]. Isn't this already a posterior?
I get however how if you had the r.v.s num_tanks ~ Unif(M,2000), observed | num_tanks ~ Unif(1,num_tanks), M some constant, that you could find a posterior distribution num_tanks | vector<observed> by first finding the joint via
E[ 1[num_tanks < t]P[observed | num_tanks] ]
I’m not sure how much higher math you’d really need to answer the question “how many tanks has the other side produced?” if these were the serial numbers of the captured tanks:
it’s tough to see an argument for anything other than:
a. about 1,000, or
b. 1,000ish but there may be a confounding fact pattern we are unaware of....
Mechanical computers that are designed to solve a single problem aren’t necessarily at a disadvantage, especially when the math isn’t that complex figuring out the variables and all factors was the problem.
Seeing that it's a uniform distribution, let's start out with assuming our sample mean (the average serial number we find) has the same distribution as the true mean (the actual number of tanks in existence). If this is true, then:
2 x mean
should be an unbiased estimator of the true mean. But because we are probably under sampling the extremes, we could use the Bessel correction:
1/(n-1) x summation_{i=1}^n(sample_i)
I would guess this comes out to a better estimation than what the article says.
Bessel's correction might be a bit of overkill, since it's intended to work with normal distributions. But I still suspect it comes out to a better estimation that what the blog post says.
I thought the same, and tried it. The mean sample mean after 10 million runs is 500.47, which is very close to the true mean, 500.5. Bessel's correction is not correct here - it's effectively multiplying by n/(n-1), so your estimate would be 536. Bessel's correction is made for estimating true variance using a sample, not for estimating true mean.
Yes of course, the mean is the first moment and a Bessel correction would be inappropriate. Now I feel stupid. The mean calculated by sum(p_t x_i) or 1/n sum(x_i) is already the best linear unbiased estimate. Maybe we can't get better than twice the mean?
What if you get into the war at a later point ? Most tank with a small serial number will have been destroyed (tanks tend to have a short life) and, using the mean instead of the maximum, you will get a seriously biaised result.
You could adjust for such problems but it seems much easier to use the maximum.
[0] https://sci-hub.tw/10.2307/2280189