It'll be really typical for a gui/server to want to share some is_valid_payload(...

eadmund · on Jan 3, 2019

> If it's a monorepo your PR might be a 2 line patch to that function, then adding the GUI and server code.

> If you split it you'll first need to have a PR on the "validation-lib" repo, then once that gets in a PR on the "server" repo, bumping the "validation-lib" version dependency, and finally a PR on the "gui" repo bumping the dependency for both "validation-lib" and "server" (for testing etc.). That's before you need do deal with the circular dependency that "server" also wants "gui" for its own "I changed my server code, does the GUI work?" testing.

The above is exactly why I am so firmly opposed to multirepo[0]-first. And it's really just a throwaway example: a real change would involve multiple different library and executable repos, all having separate PRs. And then there's the relatively high risk of getting a circular incompatibility.

This can be worth the cost, for organisational reasons. But until you need it, don't do it. It's very easy to split a git repo into multiple repos, each retaining its history (using git filter-branch). Don't incur the pain until you need to, because honestly, you're not likely to need to. You're probably not going to grow to the size of Google. Heck, most of Google runs in one monorepo, with a few other repos on the side: if they can make it work at their scale, so can you. And if, as the odds are, you never grow to their size, then you'll never have wasted time engineering a successful multirepo system instead of delivering features to your business & customers.

0: 'polyrepo,' really? https://trends.google.com/trends/explore?date=all&q=multirep... clearly shows that 'multirepo' is term.

tigershark · on Jan 3, 2019

These are two separate functions why would you ever want a function that checks both gui and server? The gui validation logic belongs to the gui layer, the server validation logic to the server layer. If you have a function that contains logic from both layers there is something seriously wrong with your design.

PeterisP · on Jan 3, 2019

The classic reason for any validation is that you want the validation to be done in the frontend (to save a network roundtrip and provide better, immediate feedback), on the backend (so that if the frontend is compromised and maliciously circumvents that validation, it still gets validated), and both of the validations to be the same to prevent inconsistencies.

A good way to fulfil those requirements is to have the exact same function available in both places.

jenscow · on Jan 3, 2019

If you have the same functionality that can't be re-used (for no reason), then I'd call that a design flaw.

I'll need a few more validation functions for each clients. I don't want to write+maintain multiple functions that do the same thing, even if it's just copy+paste.

It's "data" validation. So let's put that in the "data layer" repo.

We now have, at least:

- Server

- Web (GUI)

- Android

- iOS

- Data

- More clients?

We'll also have branches for each development task. How do we know what branch the other branches should use? One "simple" feature can easily spread over multiple repos. Does each repo refer to the repo+branch it depends upon (don't forget to update the references when we merge!), or we add a "build" repo which acts as the orchestrator?

Most PRs will need to be daisy chained - who reviews each one? Will they get comitted at the same time?

How do we make the builds reproducible? commit hashes? tags? ok, we now need to tag each repo, and update the references to point to that tag/hash... but that changes the build.

Well, I'm glad our code base is split over multiple repos because "scalability".

avar · on Jan 3, 2019

Imagine something like "curl" where a client needs to validate a manually provided request before making it.

In any case, if you're nitpicking that example you're missing the point. The same would go for any number of other shared code you could imagine between a client/server trying that logically make up one program talking over a network.

tigershark · on Jan 3, 2019

I still can’t see how you would have a shared library for a C# gui and a Java server for example. Your communication layer would obviously live in both repositories. Even in case you are using the same language and you do have shared libraries then what is the problem? The shared libraries would surely be shared with other projects so it makes sense to have them in a separate repository.

ryanbrunner · on Jan 3, 2019

In cases where there's a high degree of churn (i.e. early-stage startups) in shared libraries, updating those libraries can cause a large amount of busywork and ceremony.

If you had a `foo()` function shared between the GUI and the server (or two services on your backend, or whatever), in a monorepo your workflow is:

   - Update foo()
   - Merge to master
   - Deploy

In a polyrepo where foo() is defined in a versioned, shared library your workflow is now:

   - Update foo()
   - Merge to shared library master
   - Publish shared library
   - Rev the version of shared library on the client
   - Merge to master
   - Deploy client
   - Rev the version of shared library on the server
   - Merge to master
   - Deploy server

This problem gets even more compounded when your dependencies start to get more than one level deep.

I recently dealt with an incredibly minor bug (1 quick code change), that still required 14 separate PRs to get fully out to production in order to cover all of our dependencies. That's a lot of busywork to contend with.

tigershark · on Jan 3, 2019

It seems to me that the real problem is your toolchain. In a previous project the workflow was like this:

Update foo() Merge to master Publish shared library Deploy

So as you can see the only step added was to publish the shared library that would automatically update the version in all the projects using it. If you are really doing everything manually I can understand that this is a pain, but this has nothing to do with the monorepo / multiple repo distinction, this is a tooling problem.

joshuamorton · on Jan 3, 2019

But you've just invented a sharded monorepo, and now have all the monorepo problems without the solutions.

What if updating foo() breaks something in one of the clients (say due to reliance on something not specified). Then you didn't catch that issue by running client's tests, now client is broken, and they don't necessarily know why. They know the most recent version of shared broke them, but then they have to say "you broke me" or now one of the teams needs to investigate and possibly needs to bisect across all changes in the version bump under their tests to find the breakage.

How is that handled?

(the broader point here is that monorepo or multirepo is an interface, not an implementation, its all a tooling problem. There are features you want your repo to support. Do you invest that tooling in scaling a single repo or in coordinating multiple ones? Maybe I should write that blog post).

yoz-y · on Jan 3, 2019

Some package managers that support git repos as dependency versions can offset this in development.