Managing Technical Quality in a Codebase

mlthoughts2018 · on Oct 19, 2020

I am a senior engineering manager for machine learning in a large ecommerce company. As an engineering manager, all of my stakeholders do not care or consider code quality or tech debt. Period. At all.

I have to burn bridges, cash in hard won favors, and be generally super pedantic and unpopular any time I have to defend technical quality as a worthwhile goal. I have to give presentations, scour months worth of incident postmortem data and exhaustively lobby for every single millimeter of leeway to protect technical quality. It is universally viewed as the number one thing “getting in our way” and preventing us from having greater delivery speed or agility to experiment with new features.

This article seems to describe a fantasy world where technical and product leaders care about code quality and are supportive of the efforts required to maintain it.

I have never encountered any place of business where this is remotely true.

BurningFrog · on Oct 19, 2020

In a well factored organization, you shouldn't even have to mention you're spending time on technical quality.

That should be an internal "encapsulated" concern of the engineering effort.

Conversations with stakeholders should only deal with delivering features/user value.

mlthoughts2018 · on Oct 19, 2020

What fantasy world does that organization exist in? For example, working at Google, Microsoft, Apple, Facebook and Amazon is not in any way similar to that (and those are employers with some of the highest respect for technical quality).

BurningFrog · on Oct 19, 2020

I've worked at several such places.

It helps to build trust with the rest of the company, so they don't demand to micromanage everything you do.

ambicapter · on Oct 19, 2020

> preventing us from having greater delivery speed or agility to experiment with new features.

Seems like these people don't actually know what it is like to code (shocking, I know). The biggest factor preventing me from moving fast on the codebase I deal with at work is the non-existent design/poorly thought-out/non-ergonomic code base.

ylong · on Oct 19, 2020

If machine learning implies Python here, I am not surprised.

In the Python world, generally one has to fight an uphill battle for correctness and quality. Those who do are often ousted.

Most Python programmers have horrible attitudes towards software engineering.

vii · on Oct 19, 2020

This write up covers a lot of ground from specific issues around language and protocol transport layer choice to general advice that organisational change without a strong senior sponsor is hard. If you feel that quality needs to be increased the first question you need to ask is whether that is a widely held view. People have different expectations of trade-offs and may even benefit from noisy failures - which can grant headcount and greater focus. It might really not make sense to improve something that will be retired soon.

In this video https://www.youtube.com/watch?v=DpO1Tfa4IZ4 keynote, Amin Vahdat explains how he led a transnational approach to reliability in a huge complex system. In response to a question about whether hiring should be changed to increase reliability he says no - just that it needs to be measured and emphasised as a priority.

lxe · on Oct 18, 2020

Great write up. Rushing to institute a new org-wide process as a solution to every problem makes productivity eventually grind to a halt. And at that point it can become a vicious cycle of continuous process changes in an attempt to increase quality and productivity.

user5994461 · on Oct 18, 2020

There is a point where preventing people to write code is improving (future) productivity and maintenance.

I recall working in a large bank that enforced linting and testing on the whole codebase (monorepo with extensive CI). It's annoying at times but if it were not mandatory nobody would never write any test.

Stopping developers/interns/newcomers from adding thousands of lines of random untested code the very first minute they come in (code that would have to be maintained for the next decade by the next people) is a productivity gain for the company.

Besides, a good amount of (pet) projects have little or negative value, it's better if they don't exist in the first place.

runawaybottle · on Oct 19, 2020

One understated thing related to code quality is the social dynamic. People really do not like being told what to do. This is a reality anywhere in life, and requires a deft hand. Knowing when to push, what to push for, which hills to die on, etc.

Code style and linting should be an easy win. I think locating a few examples of a bad pattern in the codebase that multiple people repeated is a good way to make certain quality improvements sustainable. But you can’t do that every time otherwise people will think that’s all you do. You also can’t keep hounding one person in code reviews, you’ll lose their morale as well. It’s a balance, and almost more of a social issue.

Sometimes it great to point out stuff being done right too. It’d be great if it was a team effort so it there isn’t one word of god out there.

kgilpin · on Oct 18, 2020

Do people find that leverage points, as described here, are well known for large code based, or are they difficult to identify, track and manage?

“... three most impactful points are interfaces, stateful systems, and data models.”