That's an awesome diagram and quite a few servers (but latency kills the map-overlay experience, i think a static png or svg might have been better...) Interesting that servers are are either paired autoscaling over two AZs, or statically provisioned in three AZs - or am i mis-reading the diagram? Two entire mirrors for testing and staging must have become a pretty big cost.
Looks like the AutoScaling groups are applications or stores where they do not need to coordinate their actions. The 3-az deployments seems to be APIs, which I am guessing scales with each regions (to reduce data cost?) and probably brought up/down automatically with Puppet to handle post-launch configurations. (you are reading it correctly).
I would guess testing is a skeleton version of the entire deployment so the cost is minimal and just need to test new deploys and verification for tests.
Staging probably wasn't a full mirror, at best I would venture to guess they had hot swaps coming up in staging and then being switched against production via ELBs.
They mention costs a few times in articles, so I would venture to guess they did optimize around many of those corners.
For others outside the US confused about what the application actually does, and what the Narwhal in the testing/staging pictures refers to, this helped; http://arstechnica.com/information-technology/2012/11/built-...