A Review of Accelerate: The Science of Lean Software and DevOps

synu · on June 15, 2022

DevOps unfortunately became a mega-industry designed to sell software to the point that I feel it's become a sort of meaningless term, which is a shame because there are actual important concepts underlying the topic. Some of the incentives in play are all wrong.

SideburnsOfDoom · on June 15, 2022

A decade ago, "Agile" not "DevOps" and everything else about that statement was also true.

It's an industry cycle.

kqr · on June 15, 2022

> In my experience, the most abject software development failures stem from building the wrong thing, not from building the thing wrong, and the former failure occurs outside anything that the authors even attempt to measure.

> ...

> ...the Accelerate view of high performance amounts to “the team can push a button and deploy changes to prod quickly”. If you care about any software quality outcome other than that, then Accelerate does not even claim to measure it.

This is correct, and I don't really think the authors try to claim otherwise.

The underlying reasoning is that if you cannot deploy quickly, then it will be very expensive for you to notice that you've built the wrong thing (because testing in the field with real users is an amazing way to get design feedback), much less course-correct quickly.

I've read the analogy of a big ship. If it takes you a day to turn, you will end up in the wrong place more severely than if you have a small ship and can turn on a dime -- even assuming that you're equally observant about heading errors in both cases.

So it's not so much that being able to deploy quickly guarantees quality, but it's sort of a prerequisite for being able to achieve quality. (As borne out by their further investigation into business metrics.)

Most of that reasoning is pure logic and doesn't need substantiation. The one blind assertion is that "deploying to prod is a cheaper way to validate design ideas than the alternatives."

Niwbthat might sound like a tautology too -- if you can deploy quickly to prod it's cheap to deploy quickly to prod! But what claiming is that going from expensive deploys to cheap deploys (a potentially significant investment!) is still cheaper than other ways of validating design ideas.

What do I base that on? Nothing solid. I feel like it's a recurring theme when reading about high-performance design/engineering teams throughout history: examples that immediately come to mind are China Lake, Skunk Works, Toyota.

Additionally, I trust that the DORA researchers have indeed verified that this is the case, as they say. I have yet to see evidence that slow course-corrections are an improvement. (Which is reasonable argument to make: if you're able to course correct too quickly, you might end up with "pilot-induced oscillations" where you overcompensate in a naturally damped system. Maybe there's an optimal deploy frequency that's impedance matched to the underlying system? But that's getting very speculative.)

itsdrewmiller · on June 15, 2022

Most of the article is casting doubt on the idea that the "DORA researchers have indeed verified that this is the case, as they say." Pretty convincingly too, IMHO.

kqr · on June 15, 2022

At first I was going to disagree with you, but the closer I think about it, you're right.

My basis for disagreement was going to be that if "easy deployment" and "good economic performance" are correlated, the only sensical way that could happen is if easy deployment causes good economic performance.

But then I realised there's a trivial way for the correlation to go the other way: if we assume easy deployments are ineffective, then only organisations with good economic performance can afford to spend time on easy deployments.

(Of course there's always the confounding possibility. This can be analysed statistically (see Judea Perl) and it's not obvious that the authors have done this.)

So yeah, you're right. It's not clear that the authors have done that.

synu · on June 15, 2022

But then, doesn't that just sort of reduce down to:

1. Deploy quickly

2. ???

3. Profit

jljljl · on June 15, 2022

I think the assertion is "If you have everything else in the business right, then Deploying quickly makes it cheaper to fix issues and ship revenue generating features to customers, which will lead to more profit"

I think it's hard to say that there's a causal relationship though -- a lot of companies that can afford to invest in automation are those with already successful products + money to invest. This may change as it because easier to automate provisioning and deploying to environments (something I'm working on at my new company, jetpack.io)

SideburnsOfDoom · on June 15, 2022

But in this case (unlike in the classic "x, ???, profit" memes), there are connections that you can draw, that can plausibly lead to profit. If you've worked both with and without quick, easy deployments, you'll know that there's a difference.

synu · on June 15, 2022

Plausibly yes, but not as scientifically proven as you might think just from reading the book.

jljljl · on June 15, 2022

Good article breaking down the limitations of the science driven approach to productivity that Accelerate presents, and some of it’s harder to prove claims.

I did find the capabilities and KPIs that Accelerate emphasizes useful, however. I agree with the author that there are a lot of other issues (such as building the wrong thing) that impact a businesses overall success. However if you’re on a team that’s focused on developer productivity, the four metrics offered (Deployment frequency, Lead time for Changes, Time to Restore Service, and Change Failure rate) are pretty good ones to focus on.

I also found surveys useful at identifying gaps in monitoring, detecting sources of toil among an engineering team, and gauging overall employee satisfaction, though they are not great at getting granular data. That said, they can provide good areas to dig in with more ethnographic research methods.

btschaegg · on June 15, 2022

> Good article breaking down the limitations of the science driven approach to productivity that Accelerate presents, and some of it’s harder to prove claims.

I find this sentence really interesting, because it highlights that we seemingly don't read the conclusions of the text the same way.

Do you mean "science driven" to mean "inspired by science"?

Otherwise, my understanding of the post was essentially that neither "Accelerate" nor the "State of DevOps report" deserve to be called "science driven", as they lack basic control mechanisms one would expect from scientific works. And, I have to say, with some pretty convincing arguments.

…or did you mean that really scientific works on the matter are inherently limited because of the necessary rigor (which was also a take-away of this text for me)?

jljljl · on June 15, 2022

I suppose I should have put “science driven” in quotes.

But I also think it’s going to be difficult, using the methods available feasible for this kind of study, to draw strong causative conclusions about how dev ops practices impact business performance. There’s a lot of confounding variables that are hard to control for.

So I think I share both your takeaways

synu · on June 15, 2022

The halo effect point seems especially important. If you ask a bunch of people who are personally invested in DevOps, who are leading teams whose job it is to make have a successful DevOps org/transformation, then you're going to tend to get results back from surveys that show the method is working even if it's an unconscious bias on the part of the respondents. And they'll all be biased in the same direction towards each other, no coordination required.

Apart from that, the part about "inferential prediction" seems particularly damning.

Similar to the author I think that a lot of DevOps is valuable, but they raise some good points that some of the reasoning presented in the book is quite circular and prone to bias. We can't really say that DevOps is scientifically proven.

kqr · on June 15, 2022

> I am curious why a p value threshold (alpha) of 0.1 was used, rather than the more typical 0.05.

I believe p thresholds higher than 0.05 are frequently used in social sciences simply because the phenomena are so complex that you would never get anything significant with a lower p threshold.

Of course, that's not a real argument: you should pick the p threshold not based on what you think you can achieve, but based on the consequences of the decision you'll drive with the result.

In this case, the authors might not think the consequences of getting software development practises wrong is that impactful, because the humans in the loop will prevent real disaster no matter what methodology you use.

disgruntledphd2 · on June 15, 2022

Like, the 5% confidence level is a stupid cliche, but it's standard. Generally the only reason someone uses the 10% level is because they don't get 'significant' results at the 5% level. So that's a statistics smell (if you'll pardon my inventing a term).

beckingz · on June 15, 2022

I'm amazed that the authors haven't released any of their data or questions...

itsdrewmiller · on June 15, 2022

I wonder if anyone has actually requested it?

thcipriani · on June 15, 2022

I have requested some specific questions via email and got nothing back.

Below is the email I sent to one of the authors 2021-06-07:

    I tried to email this question to dora-data@google.com
    and got a bounce back, so I figured I'd try this address.

    tl;dr: I'm currently re-reading Accelerate in our book club
    at work. Is there a copy of the survey available somewhere?
   
    I'm curious about the questions asked. The book (and report)
    say things like, "[Elite performers have] 208 times more
    frequent code deployments" (state of devops 2019 pg 21)
    and "[High performers have] 46 times more frequent code
    deployments" (Accelerate pg 16).
   
    The book also says, "We asked survey respondents how often
    their organization deploys code [...] offering the following
    options" with a list of 6 options like "between once per hour
    and once per day".
   
    How do you get from multiple-choice to assertions of
    deploying 46 or 208 times more frequently? Is there
    a copy of the survey available to get an idea
    of the questions you asked?