Hacker News new | past | comments | ask | show | jobs | submit login

SRE is not a byproduct of a bubble economy. I believe Google has had SREs since the very beginning. But still I think the rest of the point still stands. These days with devops the skill set needed for devs have indeed expanded to have significant overlap with SREs. I expect companies to downsize their SRE teams and distribute responsibilities to devs.

A second major reason is automation. If you read the linked site long enough you'll find that in the early days of Google, SREs did plenty of manual work like deployments while manually watching graphs. They were indispensable then simply because even Google didn't have enough automation in their systems! You can read the story of Sisyphus https://www.usenix.org/sites/default/files/conference/protec... to kind of understand how Google's initial failure of adopting standardized automation ensured job security for SREs.




Pedantically, Google didn't have SREs as the beginning. I asked a very early SRE, Lucas, (https://www.nytimes.com/2002/11/28/technology/postcards-from... and https://hackernoon.com/this-is-going-to-be-huge-google-found...), and he said that in the early days, outages would be really distracting to "the devs like Jeff and Sanjay" and he and a few others ended up forming SRE to handle site reliability more formally during the early days of growth, when Google got a reputation for being fast and scalable and nearly always up.

Lucas helped make one of my favorite Google Historical Artefacts, a crayon chart of search volume. They had to continuously rescale the graph in powers of ten due to exponential growth.

I miss pre-IPO Google and the Internet of that time.


> “These days with devops the skill set needed for devs have indeed expanded to have significant overlap with SREs”

Respectfully disagree on this. SRE is a huge complex realm unto itself. Just understanding how all the cloud components and environments and role systems work together is multiple training courses, let alone how to reliably deploy and run in them.


But modern approaches to dev require the SWEs to understand and model the operation of their software, and in fact program in terms of it — “writing infrastructure” rather than just code.

Lambda functions, for example: you have to understand their performance and scalability characteristics — in turn requiring knowledge of things like the latency added by crossing the boundary between a managed shared service cluster and a VPC — in order to understand how and where to factor things into individual deployable functions.


That is barely tip-toeing across the very edges of SRE land.


Alright, how about expecting devs to repackage their entire until-that-point-SaaS stack into an "appliance" (Kubernetes Helm chart), containing SWE-written resource manifests that define the application's scaling characteristics across arbitrarily-shaped k8s clusters they won't get to see in advance, using only node taints; memory limits for layers of their stack they've never even seen run full-bore before; health checks that multiplex back up to a central monitoring platform; safely-revertible multiphase upgrade rollout behavior that never decreases availability; and so forth;

...and then those same devs being expected to directly debug the behavior of this "appliance" in a client environment (think: someone consuming the "appliance" through the Amazon Marketplace, where this launches the workload into an EKS cluster in the customer's own VPC, with the customer in control of defining that cluster's node pools);

...where this can involve, for example, figuring out that a seemingly-innocent bounded-size Redis cache deployment, needs 10x its steady-state memory, when booting from a persisted AOF file... for some godforsaken reason.


Yea, this is buying and using toys. Need to go down a few layers of abstraction


The idea of ops people who wrote code for deployment and monitoring and had responsibility for incident management and change control existed before Google gave it a name.

Source: I was one at WebTV in 1996, and I worked with people who did it at Xerox PARC and General Magic long before then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: