This is a very good summary of best-practices. I found the following part especially interesting:
> Make the development team responsible. Amazon is perhaps the most aggressive down this path with their slogan "you built it, you manage it." That position is perhaps slightly stronger than the one we would take, but it's clearly the right general direction. If development is frequently called in the middle of the night, automation is the likely outcome. If operations is frequently called, the usual reaction is to grow the operations team.
Also, it was quite a fun to read about the many things you can't rely on.
If you are interested in datacenter design, distributed systems or just large services, I would recommend following James Hamilton's blog: http://perspectives.mvdirona.com/.
James covers a very wide range of topics, and is great about linking to source material and related articles.
This is a great paper and James Hamilton is really an incredibly talented and intelligent guy. These kinds of resources help me improve as a systems administrator. Does anyone have other papers or blogs I should be reading for this kind of thing?
Many of the details discussed are hidden from most developers nowadays by various abstractions. But, granted, lots of the lessons are gems that can very much be applied to modern development practices and tools.
> Make the development team responsible. Amazon is perhaps the most aggressive down this path with their slogan "you built it, you manage it." That position is perhaps slightly stronger than the one we would take, but it's clearly the right general direction. If development is frequently called in the middle of the night, automation is the likely outcome. If operations is frequently called, the usual reaction is to grow the operations team.
Also, it was quite a fun to read about the many things you can't rely on.