I understand the reasoning, and agree that it’s not always abuse. At first blush...

0xbadcafebee · on Jan 18, 2020

If you can do that with 3rd party dependencies, can't you do that with all the code?

This is what confuses me about monorepos. Their design requires an array of confusing processes and complex software to make the process of merging, testing, and releasing code manageable at scale (and "scale" can even be 6 developers working on 2 separate features each across 10 services, in one repo).

But it turns out that you can also develop individual components, version their releases, link their dependencies, and still have a usable system. That's literally how all Linux distros have worked for decades, and how most other language-specific packaging systems work. None of which requires a monorepo.

So what I'd like to know is, of the 3 actual reasons I've heard companies claim are why they need a monorepo, is it impossible to do these things with multirepo? If it is indeed "hard" to do, is it "so hard" that it justifies all the complexity inherent to the monorepo? Or is it really just a meme? And are these things even necessary at all, if other systems seem to get away without it?

gen220 · on Jan 18, 2020

These are great questions!! :)

> Can you treat all code like 3rd party dependencies?

Yes, but there are trade-offs. Discoverability, enforcing hard deadlines on global changes, style consistency, etc.

> Is it impossible to do these things with multi-repo?

No, but there are trade-offs to consider.

> If it's hard, is it "so hard" that it justifies the complexity?

Hitting the nail on the head; there are trade-offs :)

> Are these things necessary, if other systems get away without it?

There are many stable equilibria; open source ecosystem evolved one solution and large companies evolved another, because they have been subject to very different constraints. The organization of the open source projects is extremely different from the organization of 100+ engineer companies, even if the contributor headcounts are similar.

For me, the the semantic distinction between monorepos and multirepos is the same as the distinction between internal and 3rd party dependencies. Does your team want to treat other teams as a 3rd party dependency? The correct answer depends on company culture, etc. It's a set of tradeoffs, including transparency over privacy, consistency over freedom, collaboration over compartmentalization.

With monorepos, you can gain a little privacy, freedom, and compartmentalization by being clever, but get the rest for cheap; vice versa for multirepos. It's trading one set of problems for another. I'd challenge the base assumption that multirepos are "simpler", they're just more tolerant of chaos, in a way that's very valuable for the open source community.

I hope we've not been talking past each other, I really like the ideas your raising! :)

0xbadcafebee · on Jan 18, 2020

I don't think we're talking past each other, and thank you for your responses.

> Does your team want to treat other teams as a 3rd party dependency?

From what I recall, 'true' microservices are supposed to operate totally independent from each other, so one team's microservice really is a 3rd party dependency of another team's (if one depends on the other). OTOH, monolithic services would require much tighter integration between teams. But there's also architecture like SOA that sort of sits in the middle.

To my mind, if the repo structure mimics the communication and workflow of the people writing the code, it feels like the tradeoffs might fit better. But I'd need to make a matrix of all the things (repos, architectures, SDLCs, tradeoffs, etc) and see some white papers to actually know. If someone feels like writing that book, I'd read it!

gdxhyrd · on Jan 18, 2020

> This is what confuses me about monorepos. Their design requires an array of confusing processes and complex software to make the process of merging, testing, and releasing code manageable at scale (and "scale" can even be 6 developers working on 2 separate features each across 10 services, in one repo).

False. It is having multiple repos what creates those problems and a huge graph of versions and dependencies.

What "processes" are you talking about?

0xbadcafebee · on Jan 18, 2020

> It is having multiple repos what creates those problems and a huge graph of versions and dependencies.

Bazel, the open source version of Google's CI tool, is built specifically to handle "build dependencies in complex build graphs". With monorepos. If it didn't do that, you'd never know what to test, what to deploy, what service depends on what other thing, etc. Versions and dependencies are inherent to any collection of independently changing "things".

Even if you build every service you have every time you commit a single line of code to any service, and run every test for every service any time you change a single line of code, the end result of all those newly-built services is still a new version. A change in that line of code still reflects the service it belongs to, and so thinking about "this change to this service" involves things like "other changes to other services", and so you need to be able to refer to one change when you talk about a different change. But they are different changes, with different implications. You may need to go back to a previous "version" of a line of code for one service, so it doesn't negatively impact another "version" of a different line of code in a different service. Every line of code, compared to every other line of code, is a unique version, and you have to track them somehow. You can use commit hashes or you can use semantic versions, it doesn't matter.

So because versions and dependencies are inherent to any collection of code, regardless of whether it's monorepo or multirepo, I don't buy this "it's easier to handle versions/dependencies" claim. In practice it doesn't seem to matter at all.

> What "processes" are you talking about?

Developer A and developer B are working on changes A1 and B1. Both are in review. Change A1 is merged. Now B1 needs to merge A1: it becomes B1.1. Fixing conflicts, running tests, and fixing anything changed finally results in B1.2, which goes into review. Now A develops and merges A2, so B1.2 goes through it all over again to become B1.4.

You can do all of that manually, but it's time-consuming, and the more people and services involved, the more time it takes to manage it all. So you add automated processes to try to speed up as much of it as you can: automatically merging the mainline into any open PRs and running tests, and doing this potentially with a dozen different merged items at once. Hence tools like Bazel, Zuul, etc. So, those processes.

gdxhyrd · on Jan 19, 2020

You are conflating language/build issues with VCS issues.

Everything you discuss also applies to multirepo, but worse, because there no one enforces consistency across all the project and you will end up with a broken interdependency.

gdxhyrd · on Jan 18, 2020

> Plus, one have to draw a line somewhere on what to include (a Python interpreter? A Go version? awk and grep?), and third party vs in-house is a fairly robust one imo.

If your code/project/company uses the dependency in any way in production and it is not a part of the base system (which should be reproducibly installed), you include it; either in source or binary form.

Why is the size a problem? Developers should only be checking out once. If your repo hits the many-GiB mark, then you can apply more complex solutions (LFS, sparse, etc.) if it is a burden.

gen220 · on Jan 19, 2020

It's a problem if the first step of your build system is a fresh `git pull` :)

Not unsolvable of course, just necessitates an extra layer of complexity.