Hacker News new | past | comments | ask | show | jobs | submit login

Having all code in a single repository increases developer productivity by lowering the barrier to change. You can make a single atomic commit in one repository as opposed to N commits in M repositories. This is much, much easier than dealing with subrepos, repo sync, etc.

Unified repos scales well up to a certain point before troubles arise. e.g. fully distributed VCS starts to break down when you have hundreds of MB and people with slow internet connections. Large projects like the Linux kernel and Firefox are beyond this point. You also have implementation details such as Git's repacks and garbage collection that introduce performance issues. Facebook is a magnitude past where troubles begin. The fact they control the workstations and can throw fast disks, CPU, memory, and 1 gbps+ links at the problem has bought them time.

Facebook made the determination that preserving a unified repository (and thus preserving developer productivity) was more important than dealing with the limitation of existing tools. So, they set out to improve one VCS system: Mercurial (https://code.facebook.com/posts/218678814984400/scaling-merc...). They are effectively leveraging the extensibility of Mercurial to turn it from a fully distributed VCS to one that supports shallow clones (remotefilelog extension) and can leverage filesystem watching primitives to make I/O operations fast (hgwatchman) and more. Unlike compiled tools (like Git), Facebook doesn't have to wait for upstream to accept possibly-controversial and difficult-to-land enhancements or maintain a forked Git distribution. They can write Mercurial extensions and monkeypatch the core of Mercurial (written in Python) to prove out ideas and they can upstream patches and extensions to benefit everybody. Mercurial is happily accepting their patches and every Mercurial user is better off because of Facebook.

Furthermore, Mercurial's extensibility makes it a perfect complement to a tailored and well-oiled development workflow. You can write Mercurial extensions that provide deep integration with existing tools and systems. See http://gregoryszorc.com/blog/2013/11/08/using-mercurial-to-q.... There are many compelling reasons why you would want to choose Mercurial over other solutions. Those reasons are even more compelling in corporate environments (such as Facebook) where the network effect of Git + GitHub (IMO the foremost reason to use Git) doesn't significantly factor into your decision.




Hello there, have you heard of service oriented architecture? You must be joking to justify a single repository with "easier to change". Your problem is that the code base must be tightly coupled if splitting the services out to different repos is not possible and you need to contribute to multiple repositories to get something done. I would say, the biggest change in Amazon's architecture was moving over to the service oriented way and it was worth the effort. Developers are forced to separate different functions to separate services and they are in charge of that service. If it goes down their are getting the alerts. All of the services are using 250ms timeouts so there is no cascading effect when a services goes down. The web page consists of few thousand service calls and it degrades gracefully. Facebook obviously have some tech depth that they need to fix. Using stupid design justified with some random crap that does not even make sense is not really acceptable (at least for me).


SOA isn't a magic bullet.

What if multiple services are utilizing a shared library? For each service to be independent in the way I think you are advocating for, you would need multiple copies of that shared library (either via separate copies in separate repos or a shared copy via something like subrepos).

Multiple copies leads to copies getting out of sync. You (likely) lose the ability to perform a single atomic commit. Furthermore, you've increased the barrier to change (and to move fast) by introducing uncertainty. Are Service X and Service Y using the latest/greatest version of the library? Why did my change to this library break Service Z? Oh, it's because Service Z lags 3 versions behind on this library and can't talk with my new version.

Unified repositories help eliminate the sync problem and make a whole class of problems that are detrimental to productivity and moving fast go away.

Facebook isn't alone in making this decision. I believe Google maintains a large Perforce repository for the same reasons.


> "What if multiple services are utilizing a shared library? For each service to be independent in the way I think you are advocating for, you would need multiple copies of that shared library (either via separate copies in separate repos or a shared copy via something like subrepos)."

No, you have a notion of packages in your build system and deployment system.

You want to use FooWizz framework for your new service BarQuxer? Include FooWiz+=2.0 as a dependency of your service. The build system will then get the suitable package FooWiz when building your BarQuxer. Another team on the other side of the company also wants to use FooWiz? They do the exact same thing. There is never a need for FooWiz to be duplicated, anybody can build with that package as a dependency.


I think you are missing the point. Versioning and package management problems can largely go away when your entire code base is derived from a single repo. After all, library versioning and packaging are indirections to better solve common deployment and distribution requirements. These problems don't have to exist when you control all the endpoints. If you could build and distribute a 1 GB self-contained, statically linked binary, library versioning and packages become largely irrelevant.


I'm telling you how corporations with extremely large codebases and a SOA do things. The problem that you described has been solved as I describe.

SOA is beneficial over monolithic development for many other reasons unrelated to versioning. It just happens to enable saner versioning as one of it's benefits.


You're coupling your 3rd party dependencies too tightly with your app logic, so that's why its so brittle. Start wrapping those functions.


When you are done writing boilerplate to wrap all the third-party functions, don't forget to write equivalent documentation too. And tests. Or are you just going to use the same function names? If the latter it's a pointless exercise, because it's unlikely you can shoehorn A different vendor library into the exact same API later.


Whilst not arguing against your other points which I think are interesting, git has the submodule system which we use for shared libraries.


I never said it was a magic bullet. What I am suggesting is to have separate services with independent software packages. This is all. Btw. you can't break a service with a commit, all the services are using versioned APIs so when you are building API version n+1 you are not ditching version n. Very simple, seamless upgrade, graceful downgrade. FYI such a system is powering Amazon and it works perfectly at scale.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: