Why limiting work-in-progress works

closeparen · on Feb 17, 2019

Two of the most under-appreciated insights in software engineering: concurrent work in progress has a productivity cost, and team size has a productivity cost (The Mythical Man Month).

If you ever want to see cognitive dissonance in action, try talking about this with engineering managers at a hypergrowth company. They will smile and nod and agree and go right back to pushing for ever increasing scope + headcount and and acting surprised when they don’t get linear-plus returns. There’s a really persistent belief that you can just grow the team faster than the backlog and everything will be fine.

Gibbon1 · on Feb 18, 2019

On the wall of my dad's office in the 1970's was the universal problem solving flow chart. And a poster that said putting more people on a project in trouble is like putting out a fire with gasoline.

I follow the former when maintaining software/hardware. And totally seen the second in action. Add more people to the project and watch the deadline slip another couple of months automagically.

Also remember first company I worked at. The shop eventually adopted a policy at my suggestion. If it isn't ready to be boxed up and shipped when people come back from lunch, it's not shipping today. That put a stop the constant Chinese fire drill of trying to get stuff out the door before UPS came. Perversely we started shipping more stuff on time.

xiphias2 · on Feb 17, 2019

I have seen this change when I was working at Google. The key problem was that the PM to Engineer ratio increased in the company, and we were getting a lot of extra work from the PMs that looked nice on paper, were easy to understand, but didn't have a real positive impact on the customer. This extra work completely blocked us from improving the performance of the product.

JoshTko · on Feb 18, 2019

As a PM I'm curious what type of work that falls into this category. Also curious if this was specifically coming from the PM or if it was coming from above.

xiphias2 · on Feb 18, 2019

It usually came from sales and other teams. The big difference that I saw is that in the past to give an engineering team work from another team meant that the other team's TL had to convince the TL of the team to take the work, and the TL had enough context to push back (which I believe is the best thing to do in most of the cases).

In the old Google PMs were working on promoting the work of engineering teams and communicating with sales/ops, but didn't have a strong power over the direction of the work of the engineering teams. Their time was a scarce resource to be used by the team, not extra bosses. In the new Google, PMs have a lot of time that they are using for fighting with each other to have some feature implemented that they are pushing.

Also the best PMs in the old Google had some engineering background, and were doing log analysis without asking help from the engineers for every question.

klenwell · on Feb 17, 2019

So having read the article, why does limiting work-in-progress work? Because it allows you to finish more work that you start? I struggled a little bit initially with the generic concepts used for modeling. But I suppose the real insight is that, focusing only on these 3 basic parameters (work started, work finished, and number of developers), you can get wildly different results. Is that a fair reading?

This is a topic to which I've given a fair deal of attention in leading software development teams using scrum. Especially after reading Donald Reinertsen's Principles of Product Development Flow, where he approaches this topic in more depth:

https://www.goodreads.com/book/show/22586058-the-principles-...

Limiting work-in-progress was a major principle in Reinertsen's book. Another related one which I've applied successfully in practice: limit batch size. In my case using scrum, this meant keeping user stories within a certain size range (2-5 points).

This has an important benefit: it controls scope creep. It does this a couple ways: 1. It's helps during planning avoid overlooking complications or other costly delays before works gets started. 2. When a story starts to creep during development, you tend to catch it sooner and control it more effectively.

Which helps get stuff done, which helps limit WIP, which as the article suggests helps get stuff done.

marcosdumay · on Feb 17, 2019

> So having read the article, why does limiting work-in-progress work?

I imagine his models are those code blocks just above the images.

A reference to the language those are written on would be great, and would let we check if the code is really realistic. But it's lacking, so I can't explain why the article claims this.

Limiting WIP happens to work on practice, and is the basis of the kanban process (both the one we use in software development, that doesn't even have a kanban, and the industrial one). Scrum does give you some tools you can repurpose into managing WIP, but those are not as robust.

donmcc · on Feb 21, 2019

Looking through past posts, it looks like the author used this: https://lethain.com/systems-jupyter-notebook/

marcosdumay · on Feb 25, 2019

Oh, thanks.

So the article just postulates that started WIP projects slow down developers on a linear basis. So when he takes away that slow-down, things get faster.

vackosar · on Feb 17, 2019

Limiting in progress is similar to thread pooling, which gives u graceful degradation.

nkingsy · on Feb 17, 2019

Slow CI and/or Code Review makes concurrent work a bit of a necessity. Context switching certainly has a cost, but it's cheaper for my employer than paying me to wait for tests to run.

williamdclt · on Feb 17, 2019

Actually...

I've had this exact problem (~30min CI and slow CR) on my current project. For a while I've sucked it up and did concurrent work as you say.

After a while it becomes apparent that this concurrent work is making us lose a lot of time:

- PRs still aren't merged several days (2 on average, regularly more) after being open. The longer they stay the more expensive they become (regular rebasing, context-switching, CRs being more and more exigeant and enlarging the scope of the PR) - Bugs appear. Doing several things in parallel leads to a loss of quality. - We're not improving, this unideal situation becomes the norm.

So I'm trying something else: we're not starting a new ticket before the last one is done. That means waiting for CR, waiting for the CI. That means having times where my employer pays me to wait for tests to run. Well, the result:

- We're actually faster. The context switching + rebasing cost is far higher than waiting 30min for tests to pass. - Now, the problem that the CI and the CRs are slow is visible (I'm visibly not doing anything), which means that we can work on it and improve! That's a basis of lean manufacturing: make problems visible. - We chase people to get reviews, so the lead time of each ticket significantly decrease. Also I'm automatically available to help people review my code (if it's a complex bit or it's a junior reviewing). - We take tickets to improve our CI and our testing perfs

All in all, doing one thing at a time was already more efficient, and in addition we can make our process more and more efficient.

The main pushback on this is generally "but we need to context-switch to review PRs", but in the end you do way more context switching when working on several things at once.

adrianmonk · on Feb 18, 2019

The trouble is, it can have a snowball effect.

Say I'm multiplexing my available hours among three things, but I get blocked sometimes. So in an effort to avoid ever having time wasted because I'm idle, I switch to having five active things instead.

Now it takes longer for me to cycle around among those things. So the people who are waiting on me for code reviews have to wait longer. They also decide to switch from having three things on their plate to five things on their plate.

Now I'm waiting even longer for their code reviews because it also takes them longer to cycle back around. I start thinking about switching from five things on my plate to seven things on my plate.

It may not really spiral out of control this bad, but it does have a bit of a tendency to self reinforce.

Sometimes it really is a good idea, but the point is there is a trade-off. In some cases, a little idle time can cost less than the problems caused by everybody juggling more stuff.

In real life, things are complicated. Not everything I'm blocked on is another person. Some tasks I need to get done don't block someone else. If I submit my travel expense report a week later, I'll just get reimbursed in a different pay period, so turnaround time doesn't matter there like it does when someone else is depending on me to unblock them.

donmcc · on Feb 21, 2019

This is widely believed but I think in most cases not true. Goldratt’s ‘The Goal’ https://www.amazon.com/Goal-Process-Ongoing-Improvement-eboo... illustrates how this works in a factory setting, but you can apply the principles to most work processes.

overgard · on Feb 17, 2019

Im unconvinced. Making sciency charts doesnt mean there's well grounded research.

The only place i worked that had a WIP limit... i never got the point. Like agile seems to be largely about visibility for management or clients, so the idea of artificially restricting people from reporting what theyre doing... like why?

overgard · on Feb 17, 2019

By the way here's what actually happens when you set in place WIP limits. I think most of us adult professionals realize that a person has a few things on their plate and a thing can be "in progress", but waiting on something, and they probably don't have the patience to move it around on your JIRA board every couple hours. So your task management essentially becomes a narrative. So you get people to lie to you and pretend like they're super focused on this one priority. It's a ridiculous game. Just don't play it.