I used to be a scientist and I went to work at Google to apply their technology to science problems. My first team was SRE- and I have to say, Google's SRE approach to computing completely changed how I thought about things, and more importantly, how I programmed systems that went to production. I've read the SRE book and can highly recommend learning from the principles it lays out.
I've skimmed it, but even the appendices alone are worth a look. Not always immediately practical in every situation, as with anything Google, but it's definitely a "handbook" I'll be studying closer over the next few weeks. :)
Oh and while Google Play may be cheaper, it's also on Safari books if folks have subscriptions to that, e.g. via libraries/proquest.
> Advance preparation, combined with extensive testing and contingency plans, meant that we were ready when things went [slightly wrong] and were able to minimize the impact on customers.
TIL that even Google have datacenter fluctuations they can't figure out. It's nice that they quietly make this info publicly available, and also nice that I've now discovered where to find it :)
> I've been seeing a lot of references to SRE recently. Is Google trying to market this position and acquire more engineers?
There is a bit of a gap (in terms of attitude and skill set) between what Google calls an SRE and what most other companies call an SRE.
I think Google is trying to steer the public usage of the word so that their term doesn't get diluted. One possible reason might be so that SREs at Google don't feel like they're making a bad career move by having the term "SRE" on their resume.
If you spent 5 years working for the state of California, designing safer future-proof and state-of-the-art treatment plans and plants for gray and black waste water in metropolitan and rural areas, improving life expectancy by 2% for people who live in California, but your title was "Sanitation Engineer" the whole time, you're going to be a bit put out if you learn that during that time all the high schools in the state changed the custodians' titles to the same thing.
SREs are very hard to hire, speaking from experience. At Google SRE directors and VPs will often cherry-pick promising candidates from the mainline SWE hiring pipeline and give them a "hero call" to convert them to SREs. SREs at Google are also paid more, controlling for level and performance, as a way to hire and retain.
In all seriousness they make it out to be more than it is. From my experience going through their hiring pipeline there seem to be two tracks in SRE; software and sysadmin. If you score higher in algorithms and data-structures, presumably, you'll end up working more on tools and libraries whereas in the other you'll work more on infrastructure and automation. Either way both tracks work together on the same team towards the same goals.
If you want in be prepared to solve simple-to-tough algorithms problems and be quizzed on TCP re-transmission, Linux system calls, and memory pressure. It's a bit challenging because you not only have to know Big-O well enough to estimate the asymptotic complexity of an arbitrary algorithm but you might also be asked what a sequence of TCP packets would look like if you sent some data and pulled the plug or what the parameters are to a given system call on Linux. You quite literally have to know everything from how virtual memory works, how to implement a fast k-means, how the network stack works from top to bottom, etc, etc.
If you've done any work in cloud development and supporting moderately large one it's that but bigger. Make one a hero, it does not.
It's just a guy on the phone telling you how you won't be like those other chumps, you'll be a hero. The few and the proud. Standard recruitment techniques.
Anecdata here, but Google have been constantly hiring in SRE for as long as I can remember (in London).
As someone who's moving in that direction career wise I don't think they're any harder to hire than a good software engineer, the skillset differ somewhat, but the culture/thought process is very similar which is what you hire for.
I'm actually really really glad that Google released this book because I think they are one of the few companies that is actually doing this SRE thing right. I think the hardest bit about the SRE paradigm (like DevOps) is having companies wholly adopt it, and I think that this book being out will help change that.
This got me wondering what the AWS services' work load per day was. Best numbers I could find were from this 2013 article about serving ≈95 billion requests per day for just S3. The size and scope of cloud providers is truly cool and fascinating engineering.
"If you put a human on a process that’s boring and repetitive, you’ll notice errors creeping up. Computers’ response times to failures are also much faster than ours. In the time it takes us to notice the error the computer has already moved the traffic to another data center, keeping the service up and running. It’s better to have people do things people are good at and computers do things computers are good at."