Hacker News new | past | comments | ask | show | jobs | submit login

These arguments are weak, IMO.

Yes, monorepos can be slow to browse through if the VCS isn’t configured to handle the size (sparse pulls aren’t the default with Git; that alone can make a massive difference when your repo is massive). Polyrepos can be just as slow? however; what’s worse is that there are more of them.

I remember working with a repo that was >20GB large, mostly from videos (we didn’t know that initially). Pulling that repo took _forever_. Nobody on that team cared because they almost never did a fresh pull and accounted the time it took for their CI/CD to do so in their reports. If it were a monorepo, MANY teams would’ve felt that pain more immediately.

Yes, monorepos require some tooling to prevent a gazillion artifacts from being deployed at once (and to specify what’s related to what if code lives across different folders). So do polyrepos! I’ve configured a few Jenkins jobs for my clients to dynamically pull different co-dependent Git repositories at build time. It’s a pain! Especially when multiple credentials are involved! Then there’s the whole “We have a gazillion repos and 20% of them are junk” problem, which requires automated reaping; also a more difficult problem than it seems.

Same with refactors. Refactors across polyrepos are just as much of a pain because you’re now subject to n build and review processes/pull requests, and seeing the entire diff is hard/impossible. This introduces mistakes. If anything, refactors in polyrepos are more of an event than they are for monorepos.

While monorepos have their problems, I will continue to advocate for them because the ability to see what’s going on in one place and for any developer to propose changes to any part of the code (theoretically) is massively beneficial, ESPECIALLY for complex business domains like healthcare or financial services. Plus, you will have a RelEng/BuildEng team when your codebase and engineering org gets large enough; why add more complexity by creating a gazillion repos that are possibly related to each other?

(The large engineering organization without a team focussed on tools and builds doesn’t exist. If it doesn’t, that means that some/many developers are spending way more time spinning their wheels on build systems than they should be.)

The real reason why monorepos don’t happen in the aforementioned domains is because there’s no easy way to allow them and pass regulatory audits.

Many regulating bodies require hard boundaries enforced by role-based access control, especially for code that deals with personally-identifiable information or code between two or more domains that have a Chinese Wall between them. “All of my developers can check out the entire codebase” is an easy way to get fined hard, and polyrepos are much easier to restrict access into than folders in a monorepo are (one advantage not mentioned in the article). While you _can_ restrict access into directories within a single repo, doing so is not straightforward, and most organizations would rather not waste the engineering effort.

I would like to think that Google and Facebook have gotten away with it because they implemented a monorepo from the very beginning and the engineering involved in splitting it up is much more involved than engineering around it.

That said, I continue to advocate for them because discoverability is good and it builds a better engineering culture in the end. I would rather hit those walls and make just-in-time exceptions for them than assume that the walls are there and create a worse development experience without exploring better alternatives.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: