It's totally different for different types of sites. Facebook is different than ...

gaius · on Dec 27, 2008

On FB I've often seen account unavailable while they do maintenance, but I've never been unable to log into Amazon or place an order. Sharding is very much overrated. However it's physically implemented, Amazon have one logical database and a customer's data is always available.

bjclark · on Dec 27, 2008

How is this contrary to what I said? I didn't say Amazon sharded. What/why would they shard? Clustering != sharding.

They most definitely have multiple instances of the app running around the world, which synchronize data with each other, which is why you've never been unable to buy something. It's highly unlikely that all their instances would be down or overloaded at the same time.

Also, they most certainly don't have a single logical database. They use all kinds of things, including SimpleDB. You think product information is stored in the same database, or even in the same away as customer information?

gaius · on Dec 27, 2008

The product data will be in multiple physical replicated shared-nothing databases each of which has the entire dataset - a single logical database. The principle of sharding is that each database has a subset of the data and you place some logic in front of it to direct the query to the right place. Now if I'd been able to buy kitchenware but not garden tools one day, then I might say their product database was sharded. But Amazon is smarter than that.

bjclark · on Dec 27, 2008

Are you still arguing that I said Amazon sharded their database? Because I've re-read what I said, and what you said like 4 times and I can't see where I said they shard.

soult · on Dec 28, 2008

Actually, Amazon is kind of sharding their database: They encapsulate every kind of data into a program that manages it:

>What I mean by that is that within Amazon, all small pieces of business functionality are run as a separate service. For example, this availability of a particular item is a service that is a piece of software running somewhere that encapsulates data, that manages that data.

(see http://queue.acm.org/detail.cfm?id=1388773)