Hacker News new | past | comments | ask | show | jobs | submit login

It's totally different for different types of sites. Facebook is different than Amazon is different than Yahoo is different than Flickr.

In general, I'd say, most sites data is sharded or clustered. Amazon, I'm guessing, basically has many many different instances of their app running on different clusters all over the world (multiple clusters per datacenter). So they upgrade a cluster at a time, and their databases all synchronize with others of the same version.

Facebook's data is, obviously, sharded by network. So any given cluster runs x number of networks. Upgrades are then, again i'm guessing here, rolled out network by network. The data layer can be different from network to network.

Most of the time though, updates to large scale services aren't changing db schemas or making huge changes to the data layer, so it's as simple as updating the code and rebooting some app servers.

(Obviously I don't know exactly how they do it, cause I don't work at any of these places, but deploying it's that hard of a problem to solve)




On FB I've often seen account unavailable while they do maintenance, but I've never been unable to log into Amazon or place an order. Sharding is very much overrated. However it's physically implemented, Amazon have one logical database and a customer's data is always available.


How is this contrary to what I said? I didn't say Amazon sharded. What/why would they shard? Clustering != sharding.

They most definitely have multiple instances of the app running around the world, which synchronize data with each other, which is why you've never been unable to buy something. It's highly unlikely that all their instances would be down or overloaded at the same time.

Also, they most certainly don't have a single logical database. They use all kinds of things, including SimpleDB. You think product information is stored in the same database, or even in the same away as customer information?


The product data will be in multiple physical replicated shared-nothing databases each of which has the entire dataset - a single logical database. The principle of sharding is that each database has a subset of the data and you place some logic in front of it to direct the query to the right place. Now if I'd been able to buy kitchenware but not garden tools one day, then I might say their product database was sharded. But Amazon is smarter than that.


Are you still arguing that I said Amazon sharded their database? Because I've re-read what I said, and what you said like 4 times and I can't see where I said they shard.


Actually, Amazon is kind of sharding their database: They encapsulate every kind of data into a program that manages it:

>What I mean by that is that within Amazon, all small pieces of business functionality are run as a separate service. For example, this availability of a particular item is a service that is a piece of software running somewhere that encapsulates data, that manages that data.

(see http://queue.acm.org/detail.cfm?id=1388773)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: