It's good, sound advice. Big teams are inefficient teams. They are harder to manage. They waste more time in meetings. And they lack focus. The easiest way to fix broken teams is to remove people from them. You don't have to fire them. Just put them in other teams or create some new teams. This does wonders for cutting down on the stress levels, heated debates, and other wasted energy that plague overstretched teams with too many captains on the ship.
I would not even wait for problems to emerge. Just split big teams on principle. Minimum size of 3. Maximum size of 7. One eight person team then becomes two teams of 3,4,or 5 people. Make sure each team has a tech lead that knows what they are doing.
Don't over-staff teams with managers/minders/scrum masters/whatever label you slap on junior management. This causes a big problem: managers like to hoard people to inflate their importance. You deflate their team and you deflate their ego and they'll be forever whining to "fix it" with more "resources".
Simple solution: give them more than 1 team to manage and direct them to keep their teams small. Now they count teams instead of people. Or teams and people. And you can measure their success as a function of how well their teams are performing.
Then if those teams get overloaded, you have a conversation about which new teams are going to be needed and who is going to manage them. Any new team should be bootstrapped with mostly existing people: you move people around and promote them. New people start out in existing teams. This accomplishes two important things: people have a perspective of getting promoted sideways and new people learn the ropes in a small well functioning team rather than being dumped in an overstretched team.
3 is risky. One person has a baby and goes on leave, or gets long Covid, or is poached by a FAANG. Until a substitute can be found, the team of 3 turns into a rather overwhelmed team of 2, each spending 50% time on call. And if either of them goes on vacation to recharge, the remaining one is left with an unbearable workload.
The biggest source of relief for oncall for me is that even though my "team" is 8 people. We work on larger systems. We have ~5 teams working on a single code base, so around 30 people. Its not really a microservice, but its not really a monolith (since we have many other, and larger codebases). On call is much more bearable when its once every 3-4 months than once a month.
Being on call requires being pageable and able to respond. This seriously curtails your ability to live your life. For instance, you can't decide to go hiking over the weekend on a whim, because you couldn't handle an incident, were one to occur.
I've worked in a team of 4 (and later, of 5), and it was amazing: everything could still be communicated quickly, but you didn't feel overwhelmed if a team member was inaccessible.
I imagine those restrictions on team size only apply to pure development tasks.
You certainly want more than one of those teams to be able to support each piece of your software. You also want to be able to move people quickly from a team to a close one, so from the training point of view you also want them to know both codebases.
Metcalfe's law succinctly explains why communication becomes more fraught as teams and organizations grow. Good managers minimize this quadratically rising cost.
When during a job interview they ask me what was the size of my team, I invariably say something between 2 and 5, and comment that a larger team cannot be efficient. At least, in my practice a larger team was never efficient as a team, and I helped split teams as they grew past this threshold.
> team decided to have a shared on-call rotation. They would cross-train. Each team makes a procedure book that covers any first-on-the-scene tasks for most alerts and issues.
Shared on-call is a section I strongly disagree with. Shared on-call erodes the very same boundaries that provided value in the first place. You could say it's a lesser and necessary evil, but you should at least be open about it.
Procedure book is a naive "sounds good, doesn't work" solution. Why don't you "solve" the on-call problem completely by outsourcing it to a $15/hour worker if you have these amazing playbooks?
This approach naturally leads to bloated playbooks. It's very similar to trying to solve all of the architectural issues by "writing good documentation". Never works.
On-call is typically a company-wide policy, and those are at odds with smaller independent teams.
The right answer is giving those teams more independence: they figure out how to solve their on-call themselves. Maybe playbooks is the answer for them. Or they have people who are still online after work hours and they don't mind being semi-oncall. Or maybe all their issues are not critical and 24/7 on-call is not necessary.
Smaller teams benefit from independence, don't ruin it via company-wide policies.
I think Team Topologies, which the article mentions, is a pretty foundational book. Not just that it's full of wisdom, but it's so useful to have the vocabulary to wrangle with these problems and their potential solutions. And as a bonus they even wrote a small followup about remote work. Both are highly recommended reading if you work in an organisation with any number of teams greater than 1.
Very timely post for me - as this is exactly what I'm trying to achieve at my own place :P I've forwarded it to several people to help bolster my argument, so thanks!
Glad you found it useful! I highly recommend the Teams Topologies book cited in the article. Its a bit dry at first, but the solution leverages everything you learn in the earlier chapters in a jujitsu-like "use the enemy's power against itself" way. (Hope that makes sense)
Passing along considerations as in practice I’ve found there is no free lunch with splitting teams:
- Quantify and measure the overwhelmedness and productivity of the teams.
- Time spent on where you’d like to get to within the next 1-3-5 years with the teams and the services they vend out is time well spent.
- Evangelize the re-organizational plan to stakeholders and leadership and hopefully they are onboard and can provide air cover while teams lose velocity during the change.
- There will be bumps along the way but coming back to being able to quantify and re-measure overwhelmedness and productivity of the teams will help define success or not.
Isn't the ideal teams size just one? I do not mean that the one-person "teams" should not meet from time to time to agree on some common issues. But the ideal situation is that the work should be split in such a way that a single person has complete responsibility over an area and does not need to coordinate with someone else most of the time.
A team of one also cannot have multiple experts gathering to solve a particular problem, where their combined set of expert knowledge matches the problem best. The one person must be a jack of all trades, and necessary a master of only a few, due to humans' limited lifespan.
(I'd say it's a bit like a multi-threaded program: very often one thread is not enough, but only a few threads can do varied but coordinated tasks. Massive multiprocessing only works when coordination between peer threads is not required, and usually they do the same embarrassingly parallelizable thing anyway.)
Additionally, a team of one has no redundancy. You know, how Bob is responsible for maintaining the business critical databases, and now the databases are on fire and Bob is on a canoe tour through Canada without a phone. Oops.
For business critical things, you generally want 3 guys who can replace each other competently. Three, because one is none if the one guy gets sick, and two is also one if the other is on vacation and thus none.
You jest, but we've had weeks during which - out of 6 people - one was on vacation, two were sick, and then two more had to call in child-sick because their respective day cares had to close due to positive covid tests. And suddenly you're last man standing between the outage gremlins and customer systems.
While one person per team is ideal, two are better because they can discuss ideas and improve through arguments. If their views become too extreme, having a third person is better to moderate the views. Fourth person is needed so that one person wouldn't be a pariah holding minority view, and you want a fifth guy to make sure in 50/50 split, a decision gets made.
Five people need a lead that keeps track of everything, maintains the goal and resolves conflicts. Of course, two leads are always better than one because they can discuss ideas..
One person teams are only best when it comes to coordination overhead. they have obvious deficiencies in terms of "bus factor". If the one person in the team leaves the company, gets ill, dies or takes a vacation, work will have to be transferred to someone else (in which case the coordination overhead returns) or the work has to stop (which is not always feasible).
A lot of the inefficiencies of bigger companies come from the redundant people needed to fill out the bus factor. Having a DBA on hand for emergencies might be overkill for a startup, but when you have revenue of 100 million per day depending on the database always being up it is much cheaper to have a spare person sitting around (even if they're not doing much most of the time).
One person "teams" but reviewing and approving each others' work and then switching every six months or so, works well.
From what I have seen single person can deliver more and quicker than 2-5 people and having to seek approval from their peers keeps quality in check too.
In that case you don't have a one person team. You have a distributed team with effective communication and well designed systems that a single individual execute with minimal coordination. Which is the ideal but calling them one person teams is a misnomer.
I think you want at least two to cover people missing obvious solutions as well as provide a second pair of eyes when finding issues.
And because two means if there is a reason one person isn't around loses the benefits two people gives you then I'd argue three is the minimum size you should aim for.
Fantastic. One problem I've seen a few times, is the desire to consolidate teams to tackle the flavor-of-the-month project. You have an infrastructure team and an applications team, but somebody up the chain has staked their promotion on getting a big infrastructure project done. How many teams was that again?
The article had a whole section about that -- they shared the on-call duties across both teams (but required a high-quality playbook for each team's systems).
Which is something that sounds fine but, IMHO, practically speaking it's not really going to work. Most of the time you're going to end up having to page for the systems your team doesn't own for any non-trivial issue.
When your team owns an application you have no background knowledge in, and it uses a technology that you haven't touched since you attended a training 18 months ago; I would assume that your on-call response for any non-trivial issue would still be to page someone for help.
In this case the solution (cross-coverage) worked for that organization. However I agree that more often than not, it won't work.
I didn't have the word-count to go into this (that could be the topic of an entire book... or at least a chapter of The Practice of Cloud System Administration [https://the-cloud-book.com/]).
That said, my personal rule of thumb is: 6 is the minimum for an oncall roster; 5 if there is another team doing follow-the-sun coverage. YMMV of course.
But the on-call footprint is also smaller. Ideally up to a point where most of the problems don't require immediate response, so most teams won't need a 24/7 on-call.
I think the issue is that first people introduce 24/7 on-call, then they split into teams and this policy no longer makes sense, but noone has the balls to roll it back because it's optics are bad. But you should.
I would not even wait for problems to emerge. Just split big teams on principle. Minimum size of 3. Maximum size of 7. One eight person team then becomes two teams of 3,4,or 5 people. Make sure each team has a tech lead that knows what they are doing.
Don't over-staff teams with managers/minders/scrum masters/whatever label you slap on junior management. This causes a big problem: managers like to hoard people to inflate their importance. You deflate their team and you deflate their ego and they'll be forever whining to "fix it" with more "resources".
Simple solution: give them more than 1 team to manage and direct them to keep their teams small. Now they count teams instead of people. Or teams and people. And you can measure their success as a function of how well their teams are performing.
Then if those teams get overloaded, you have a conversation about which new teams are going to be needed and who is going to manage them. Any new team should be bootstrapped with mostly existing people: you move people around and promote them. New people start out in existing teams. This accomplishes two important things: people have a perspective of getting promoted sideways and new people learn the ropes in a small well functioning team rather than being dumped in an overstretched team.