In my experience, monorepos cause outrageous problems that have nothing to do wi...

zeveb · on Jan 3, 2019

Oddly enough, you could s/mono/multi in your post and that would exactly align with my own experience. I'm not kidding: everything from engendering reliance on weird homegrown tooling, CI & build pipelines to the pain of trying to break out to a different approach, to enforced bad practices, to developers (unknowingly) misleading management, to colossal failures.

I've worked on teams with monorepos and teams with multiple repos, and so far my experience has been that monorepo development has been better — so much so that I feel (but do not believe) that advocating multiple repositories is professional malpractice.

Why don't I believe that? Because I know that the world is a big place, and that I've only worked at a few places out of the many that exist, and my experience only reflects my experience. So I don't really believe that multiple repositories are malpractice: my emotions no doubt mislead me here.

I suspect that what you & I have seen is not actually dependent on number of repositories, but rather due to some other factor, perhaps team leadership.

mlthoughts2018 · on Jan 3, 2019

Everyone always says this type of response about everything though. If you like X, you’ll say, “In my experience you can /s/X/Y and all the criticisms of X are even more damning criticisms of Y!”

All I can say is I’ve had radically the opposite experience across many jobs. All the places that used monorepos had horrible cultures, constant CI / CD fire drills and inability to innovate, to such severe degrees that it caused serious business failures.

Companies with polyrepos did not have magical solutions to every problem, they just did not have to deal with whole classes of problems tied to monorepos, particularly on the side of stalled innovation and central IT dictatorships. Meanwhile, polyrepos did not introduce any serious different classes of problems that a monorepo would have solved more easily.

yowlingcat · on Jan 4, 2019

Absolutely amazing to me how much engineers conflate organizational issues with tooling issues. Let's take a look at one of your comments:

"The last point is not trivial. Lots of people glibly assume you can create monorepo solutions where arbitrary new projects inside the monorepo can be free to use whatever resource provisioning strategy or language or tooling or whatever, but in reality this not true, both because there is implicit bias to rely on the existing tooling (even if it’s not right for the job) and monorepos beget monopolicies where experimentation that violates some monorepo decision can be wholly prevented due to political blockers in the name of the monorepo.

One example that has frustrated me personally is when working on machine learning projects that require complex runtime environments with custom compiled dependencies, GPU settings, etc.

The clear choice for us was to use Docker containers to deliver the built artifacts to the necessary runtime machines, but the whole project was killed when someone from our central IT monorepo tooling team said no. His reasoning was that all the existing model training jobs in our monorepo worked as luigi tasks executed in hadoop.

We tried explaining that our model training was not amenable to a map reduce style calculation, and our plan was for a luigi task to invoke the entrypoint command of the container to initiate a single, non-distributed training process (I have specific expertise in this type of model training, so I know from experience this is an effective solution and that map reduce would not be appropriate).

But it didn’t matter. The monorepo was set up to assume model training compute jobs had to work one way and only one way, and so it set us back months from training a simple model directly relevant to urgent customer product requests."

What do you think is the cause of your woes, the monorepo, or the disagreement between your colleague in central IT tooling who disagreed with you? Where was your manager in this situation? Where was the conversation about whether GPU accelerated ML jobs were worth the additional business value to change the deployment pipeline? Was that a discussion that could not healthily occur? Perhaps because your organization was siloed and so teams compete with each other rather than cooperate? Perhaps because it's undermanaged anarchy masquerading as a meritocracy? Stop me if this sounds too familiar.

I've been there before. I know what it feels like. But, I also know what the root cause is.

mlthoughts2018 · on Jan 4, 2019

Nobody is conflating anything. Culture / sociological issues that happen to frequently co-occur with technology X are valid criticisms of technology X and reasons to avoid it.

To argue otherwise, and draw attention away from the real source of the policy problems (that the monorepo enables the problems) is a bigger problem. It’s definitely some variant of a No True Scotsman fallacy: “no _real_ monorepo implementation would have problems like A, B, C...”.

The practical matter is that where monorepos exist, monopolicies and draconian limitations soon follow. It’s not due to some first principles philosophical property of monorepos vs polyrepos — who cares! — but it’s still just the pragmatic result.

Also you mention,

> “Where was the conversation about whether GPU accelerated ML jobs were worth the additional business value to change the deployment pipeline.”

but this was explicitly part of the product roadmap, where my team submitted budgets for the GPU machines, we used known latency and throughput specs both from internal traffic data and other reference implementations of similar live ML models. Budgeting and planning to know that it was cost effective to run on GPU nodes was done way in advance.

The people responsible for killing the project actually did not raise any concern about the cost at all (and in fact they did not have enough expertise in the area of deploying neural network models to be able to say anything about the relative merit of our design or deployment plan).

Instead the decision was purely a policy decision: the code in the monorepo that was used for serving compute tasks just as a matter of policy was not allowed to change to accommodate new ways of doing things. The manager of that team compared it with having language limitations in a monorepo. In his mind, “wanting to deploy using custom Docker containers” was like saying “I don’t want to use a supported language for my next project.”

This type of innovation-killing monopolicy is very unique to monorepos.