Launch HN: Diversion (YC S22) – Cloud-Native Git Alternative

hintymad · 2024-01-22T18:29:55 1705948195

> Cloud-Native Git Alternative

Not sure if this is a good summary of the product. For one, cloud-native is an implementation detail, unless the company plans to sell the new VCS as packaged software instead of service. For two, I'm not sure how being cloud-native addresses any issue with my daily interaction with Git.

> The biggest drawback of Git is its limited scalability

I wonder how many people really has this problem. Millions of people have been using github and gitlab. I'm curious about the percentage of users who feel that there is a scalability issue with their own repositories. Personally, I don't have any beef with git's scalability at all, even though the companies I worked for had anywhere between hundreds to tens of thousands of engineers. Maybe having a monorepo will lead to scalability problem? But monorepo is a debatable topic, to say the least.

> Diversion is built on top of distributed storage and databases, accessible via REST API, and runs on serverless cloud infrastructure. Every repository operation is an API call (commit, branch, merge etc.). The desktop client synchronizes all work in progress to the cloud in real time

Again, how does this have to do with me, a user? Why would I care about the underlying protocols when I simply use a CLI or a UI?

sasham · 2024-01-22T19:04:46 1705950286

Monorepos are indeed causing problems with git, and this is one of the main arguments against them (see [1]). Some companies are building their own solutions (Google, Meta), and some are splitting their monorepos because of these problems. IMO if a company wants to run a monorepo for their reasons, they shouldn't be limited by their VCS.

The technical details are for the readers who want to know, I agree it's not really important for the users (most of them, at least).

[1] https://medium.com/@mattklein123/monorepos-please-dont-e9a27...

hintymad · 2024-01-22T19:52:41 1705953161

Having an effective monorepo at scale will also require an entire infrastructure to solve all the problems that a poly-repo must solve and more. In particular,

- Partial download, as a monorepo will quickly grow too large for a single person to download. This is trivial for poly-repo but requires dedicated system for monorepo.

- Dependency management. With a decently sized monorepo, one can't compile everything and test everything. So, someone needs to build a dependency manager to track all the DAGs, and build only the DAGs that are impacted by a commit. One also has to build a trackign mechanism for deploying different build artifacts because a team may deploy all the build artifacts in different date and time. We will need more sophisticated build tools too.

- Build infrastructure. Even with a perfect dependency-tracking system, we may still end up building large-enough source code that we need to build the code in parallel.

- Directory-level access control. This is also trivial for poly-repo since the granularity is at repo-level, but it requires dedicated implementation for a mono-repo.

I'm not sure if the marginal benefit of having a monorepo can justify the investment for most of the companies. Google created monorepo initially to manage the dependencies of C++ code, and Perforce already supported partial downloads. But with more modern languages that have their own way of dependency management? I'm not so sure about the benefits. Making refactoring easier? How many repos are really shared at source level across multiple teams in a company? Encouraging sharing source and therefore knowledge? Isn't it a solved problem? Any decent company allows searching source code at semantic level across multiiple repos. If I want to see the source code of a particular package in my IDE, it's just a click away. Note I'm emphasizing marginal return of monorepo. Case in point, Google maintains the very use Guava library, which is probably used by millions of engineers. Does it lead to pains of incompatibility errors at runtime across different releases? Absolutely. Is it worth changing my poly-repo to monorepo to solve the problem? I highly doubt so. The compatibility issue happens rarely given good testing setup. When I do need to migrate my code, the cost is bi-modal: either the refactoring is trivial, or it requires serious testing and design changes, which a monrepo will not help anyway.

Note I'm not saying that monorepo is not useful. Instead, I question how many companies will benefit from switching to monorepo, which may lead to the discussion on the potential market share of Diversion.

jitl · 2024-01-22T23:42:38 1705966958

Polyrepo is such a pain in the ass though. At a smaller scale it's much, much nicer, and then when we get big enough to hit all those scaling problems, we'll be able to afford it by hiring a team of 3 to go implement Bazel/Buck2/..., and perhaps switching from Git to Diversion.

vlovich123 · 2024-01-23T17:17:07 1706030227

It’s not fully open source yet (ie you can’t use it yourself I think) but Facebook’s EdenFS project solves the partial checkout problem.

The queries in Bazel/Buck to figure out the changed set of dependencies probably isn’t complicated and that’s why there’s no turnkey solution? You do need to adopt a build system with precise dependency tracking (afaik only Buck and Bazel support that) or the monorepo path isn’t going to be very successful.

foofie · 2024-01-22T19:11:29 1705950689

> Monorepos are indeed causing problems with git, and this is one of the main arguments against them (see [1]).

The article is a disappointing read. It spends a lot of time talking about monorepos and how they spell all sorts of trouble. Yet, the article makes zero mentions of submodules as a way to get the best of both worlds.

kristjansson · 2024-01-22T19:52:45 1705953165

Submodules are great, but they're hardly an alternative to monorepos.

sitzkrieg · 2024-01-22T22:05:48 1705961148

ive never seen positive feedback for submodules before

carlthome · 2024-01-22T20:04:21 1705953861

Why not? Just want to understand.

jitl · 2024-01-22T23:46:24 1705967184

Submodule workflows have a lot of overhead at review time. During development it's fine, you work with the fully materialized tree just like it's a monorepo. But once you need to submit your changes for review, how does that workflow look?

1. Commit in submodule A, then get it reviewed and merged as SHA 123

2. Update submodule A to 123, get it reviewed

3. Reviewer has feedback on usage of new API in submodule A

4. Make another PR on A, at commit 457. This time don't merge it since reviewer on main repo might have more feedback.

Monorepo:

1. Make PR to monorepo

2. Get review feedback

3. Push changes to PR branch

4. Merge

5. Update submodule to 456, push to existing PR

...??

foofie · 2024-01-23T09:16:12 1706001372

> But once you need to submit your changes for review, how does that workflow look?

1. Post PR to submoduke A. Get it merged.

2. Post PR to the main repo updating it to point to subproject A.

Done.

The only difference between a monorepo and splitting the repo into submodules is that the main repo's history is coarser and basically tracks the output of integration tests. There is no need to overcomplicate things, and if you need to overthink them anyway then you have far more degrees of freedom to worry about in monorepos.

vlovich123 · 2024-01-23T17:22:23 1706030543

That’s a really slow review process. It also prevents reviewers from seeing the bigger picture of how step 1 manifests in step 2. In practice what I’ve seen you end up with both reviews simultaneously referencing each other in the description and once approved you merge 1 and update the pointer in 2 to point to the new merged commit if it changed.

That’s a lot of annoying and sometimes error prone manual bookkeeping that has nothing to do with the engineering work itself

kristjansson · 2024-01-24T18:41:27 1706121687

Anything that cuts across submodule boundaries needs as many MRs as boundaries it crosses, conflicting submodule pointer updates in the main require additional MRs (in the submodules) to resolve and coordination between those MRs.

They're basically fine for slowly-moving dependencies, vendoring, etc. but they emphatically do not solve the large-org many-team coordination problems that monorepos are meant to solve.

FWIW, git is a great monorepo platform for 1-10m lines of code (Linux, $MY_JOB, ...). It's only the very largest scales (Windows, Google3, ...) or asset heavy cases (ML, game dev) that need special treatment.

weebull · 2024-01-23T12:53:32 1706014412

Monorepos are a problem born of CI which can't cope with cross project dependencies properly. People have solved the problem by pumping everything together, but it's the wrong answer.

Fix CI and the problem goes away.

vlovich123 · 2024-01-23T17:20:24 1706030424

Sorry no. CI and monorepos are at best tangentially related. Dependency tracking across repos is a PITA which inhibits code reuse - git sub modules suck as do whatever that alternative git submodule concept is called (subtree?).

Code repos like Cargo and NPM can help but even still it’s an annoying dance to update dependencies in multiple downstream projects. And if there’s a code change you need to make, it’s a 3-way orchestration of new api, update downstream dependencies, remove old api.

weebull · 2024-01-24T07:20:59 1706080859

That's exactly my point. Cross dependencies is exactly the problem I want CI to solve.

vlovich123 · 2024-01-24T22:23:00 1706134980

Like the CI system automatically push code commits updating downstream dependencies that reference the upstream repo? Or something else?

Cthulhu_ · 2024-01-22T21:05:04 1705957504

Re: scalability, in the very first sentence they mention game development, which deals with large quantities of large (and growing), nowadays versioned assets like 3D models, textures, animations, etc.

danudey · 2024-01-23T17:47:34 1706032054

As someone who worked on the backend (workflow, infra) side of a game dev studio, there are a lot of massive benefits I see with this sort of "what if Dropbox but Git" product workflow.

We couldn't actually use git for our asset management, because when you're dealing with 1GB+ Photoshop files, versioning them with any reasonable granularity breaks your Git repository, makes clone times and local file storage requirements astronomical, and doesn't really make sense anyway. We ended up using SVN, since it only transfers what you check out and you can check out subtrees trivially, but then that required getting a GUI SVN client, providing it to our art team, teaching them how to use it, and then having them come to me whenever something in SVN got confused or broken (e.g. they opened and then closed a document and Photoshop updated the thumbnail, now there's a merge conflict and they can't commit).

We also ended up using Google Drive for a lot of stuff, and eventually migrating to Team Drives once that was a thing, but that doesn't integrate with... basically anything, honestly, or at least not with any reasonable degree of straightforwardness.

I don't work for that company anymore, but the thing that would make me most interested in this product would be:

1. Self-hosting it (would pay 'enterprise' rates for this); or

2. Being able to locally proxy/cache assets for users in the office, so that committing a 1 GB PSD didn't require 20 artists to all pull down 1 GB each from the server

A lot of people seem to be comparing this to actual Git, but this doesn't replace Git unless you're using Git wrong; what it replaces is the absolute disaster of a workflow that a lot of companies have to try to build/use/teach internally.

hintymad · 2024-01-22T22:12:44 1705961564

Ah, I guess this is the curse of ignorance: I saw the sentence but didn't register its significance as I'm not familiar with what's required in game development.

gertop · 2024-01-22T22:42:48 1705963368

[flagged]

hintymad · 2024-01-23T01:10:33 1705972233

Yeah, possibly. I have only visibility to the teams I worked with. That's partly why product market fit is hard to find, as it relies heavily on intuition. I'd be happy if I'm wrong.

osigurdson · 2024-01-22T14:14:59 1705932899

>> most notably in games development, semiconductors and financial services are still using legacy tools like SVN and Perforce

I think this should be your elevator pitch. Don't focus too much on "git complexity" as most people already know git so it just creates an argument. Scalability, in terms of numbers of users is somewhat hard to argue as well (Linux kernel has 1000s of contributors). However, it is completely true that git does not natively handle large binary assets well. You can even quote Linus:

"I really don't know what to do about huge files. We suck at them, I know."

oblio · 2024-01-22T14:53:02 1705935182

> Don't focus too much on "git complexity" as most people already know git so it just creates an argument.

I'd say this phrase is both right and wrong.

It's right in the sense that it creates an argument.

It's wrong in that it creates an argument with the peanut gallery of git experts. But guess what, most people using git are not experts. They're software developers who don't want to learn the intricacies of git (probably most software developers out there), they're software development adjacent folks (think data scientists, etc) who for sure don't want to learn the intricacies of git, etc.

The "common person" using git will most likely resonate on the "git complexity" argument.

philsnow · 2024-01-22T17:46:09 1705945569

> peanut gallery of git experts

There are people who use git for its original purpose (kernel devs and very few others) and then there is the remaining 99% of people who essentially use “github flavored git”, using only three or four git subcommands and for the most part never needing to understand its intricacies.

Unfortunately, although they are using git-the-chainsaw-shotgun with all the safeties on, it’s nonetheless a chainsaw shotgun and sometimes they’ll run into issues where they or somebody in their company needs to be an expert and figure out how to un-scramble an egg, so to speak.

If a new VCS can solve the 99% case and never need users to fall back to understanding nitty gritty details, it could very well have strong takeup especially among people who don’t give a crap about what VCS they’re using as long as it gets out of the way and doesn’t make them think (game devs, data folks, etc).

oblio · 2024-01-22T18:05:18 1705946718

I see 2 huge barriers for a new commercial VCS:

1. Devs don't like to pay for core tools. And VCSes need network effects.

2. VCSes seem to be really hard.

sasham · 2024-01-23T09:53:37 1706003617

Totally agree, it won't be easy. Companies do pay for GitHub/GitLab/Perforce though, and for indie devs there's the free tier. I think what made git really take off is actually GitHub's free tier/OS hosting, and not git itself being free (at least for parts of the market, and I might be wrong).

100% correct about VCS development, it's much harder than one can expect.

danudey · 2024-01-23T17:51:22 1706032282

> Companies do pay for GitHub/GitLab/Perforce though

Those products also provide a huge amount of other value and functionality (though at a high price).

As someone who worked at a (mobile) game dev studio, this "what if Dropbox but Git?" product design really hits for me. Teaching our artists to use Google Drive or Team Drives was easy, but the functionality isn't there. Teaching them to use SVN was a nightmare (because SVN workflows are a nightmare) but the functionality is... also not really there?

Give me either a local installation that I can set up in my office or a local proxy to reduce download speeds (or peer-to-peer on local network, the way Dropbox does it) and I could see this being beneficial for a lot of especially smaller studios.

gavinhoward · 2024-01-22T22:06:58 1705961218

As someone who is making one (not Diversion), I agree with both of your points.

Do you think that a VCS that solves the large file problem and binary file problem could succeed?

ffgjgf1 · 2024-01-23T08:21:44 1705998104

> If a new VCS can solve the 99% case and never need users to fall back to understanding nitty gritty details

Mercurial was a bit like that? Yet it didn’t stand a chance against git regardless

withinboredom · 2024-01-22T17:21:26 1705944086

Learn your tools or one day you’ll lose a finger, or worse, your life.

— high school shop class

oblio · 2024-01-22T17:24:46 1705944286

Yet software is not a chainsaw and a huge amount of people never learn to use their software tools. Especially since they're 100x more complex than hardware tools and nobody has time to master everything.

withinboredom · 2024-01-22T18:03:45 1705946625

If you are using a chainsaw in a wood shop, you are probably doing something wrong. The saying “learn your tools” means to spend some time learning your options and what is available to you, learning the “gotchas” and why. Woodworking tools are rather complex with “gotchas” that will kill you in less than a hundred ms.

Using Git isn’t much more complex than using a lathe (simpler even, as you can get by with no skill and rote memorization). Taking a weekend to learn the data structures, and how everything fits together is not a hard ask. Especially since you literally only have to do it once in your entire career.

oblio · 2024-01-22T18:09:16 1705946956

> Taking a weekend to learn the data structures, and how everything fits together is not a hard ask. Especially since you literally only have to do it once in your entire career.

Like all simplifications, this is false. If all you do for years after is commit, push, merge, you'll forget.

Especially since you'll need those brain cells to learn the new CPU/GPU architecture, the new JavaScript framework, the new corporate security policy, the docs from your internal architecture team, etc.

Nobody's life revolves around intricate VCS details.

withinboredom · 2024-01-22T18:28:10 1705948090

You don’t need to memorize it for life. Jeez man, don’t be so hard on yourself.

The point is, you know what is possible. You know what is impossible. 12 years later, something happens and you go … hmm, I used to know what is going on. I think I need to search for something about git-tree or something?

The point is, you know what to search for, a starting point. You don’t just reach for git cherry-pick, but realize you can use git rebase --onto to copy/paste an entire branch. You don’t worry about merge conflicts because you only have to do it once with rerere. You learn git reflog will remind you what branch you were working on this morning before you got pulled into some shenanigans in prod. You can set up automation with global hooks. There’s so much you can know to do less work and you only need to remember the parts that are valuable to you.

After learning git about 5 years ago stuff like ^ is all second nature to me, for nearly 10 years since switching from SVN. My first 5 years was just like you said. Commit, pull, commit, pull. I didn’t even know it could do anything else and I was worse for it.

osigurdson · 2024-01-22T15:59:25 1705939165

It is just a tough argument to make: the thing you have been using for your entire career and used almost everywhere is suddenly too complex.

oblio · 2024-01-22T17:26:18 1705944378

Ummm... A lot of people just endure using git. Go to the average enterprise software shop, the ones where people don't code for fun in their spare time, and ask around.

There are a lot more of those devs than unicorn and FAANG devs.

osigurdson · 2024-01-22T20:21:44 1705954904

Why aren't these teams choosing something else then? If everyone on the team dislikes git, switch to mercurial or something else.

layer8 · 2024-01-22T22:08:43 1705961323

Because often it’s not the team that chooses, but tooling is instead standardized across the enterprise. And enterprises like to make the “safe” choice of choosing what’s most popular. And then there’s the whole aspect that you have to know some basic Git anyway to debug your way through the open source code you use (maybe not for JavaScript/NPM, I don’t know). Git also happens to currently be the most interoperable with other kinds of tooling, from CI to IDEs, so not using Git makes your life harder in ways unrelated to its inherent qualities. It’s a network effect in multiple dimensions.

osigurdson · 2024-01-22T23:11:38 1705965098

How did it get so popular if disliked by the majority of dev?

stackskipton · 2024-01-22T23:49:06 1705967346

Because generally people picking the tools are not majority of dev. They are architects or Senior Developers who do enjoy learning new things.

Also, git generally does just work and most IDE/Source Control Systems take care of basic operation of pull/branch/commit/push/open PR.

osigurdson · 2024-01-23T00:06:11 1705968371

I suppose. It might seem a bit perverse after all if the non-engaged, uninterested in coding, clock puncher devs got to make all the decisions.

layer8 · 2024-01-23T00:17:28 1705969048

Popular as in “everyone is using it”, not necessarily “everyone is fond of it”. How did Jira become so popular? The dynamics that lead to such outcomes are interesting, but hardly unusual.

sasham · 2024-01-22T21:46:11 1705959971

Git might be the best thing there is today (outside of very large companies or environments with large binaries). It doesn't mean that'll always be the case... There were other VCSs before git, and there will be after.

bigstrat2003 · 2024-01-23T01:55:20 1705974920

I use git because other people use git, and ultimately I try to accommodate the tools that my peers are going to be used to. But I do think it's too complex (by a lot), and if I was running some kind of dictatorship I would never touch git again. Frankly, I think that git is a significantly worse tool than svn was for most use cases.

osigurdson · 2024-01-23T02:10:39 1705975839

I like git better than svn personally. But, svn is better at managing large binary files.

myfonj · 2024-01-22T17:21:15 1705944075

Out of curiosity, are there any VCSs that operate on AST instead of plaintext lines? (Or is something like this being developed or proven impossible?)

I guess it should be possible to cooperate on shared codebase without need for every contributor to check in and out text files following exactly the same formatting. Or even naming convention. Or even same language, provided all collaborators can transpile to and from some agreed-upon shared AST target.

I know it might seem unhinged at first, but think about it: your (parseable) code is representation of the tree anyways (with some unrelated "whitespace fluff" around). If you follow strict formatting rules that you can express programmatically, you can recostruct that "fluff" from bare AST. If you can store all your violations against your style near the code, you can even sin and break it. If you store data about what you need to see differently from the shared AST - local renames of variables, for example - then you should be able to use your own naming convention, formatting and even source language, without bothering collaborators with tabs/spaces, hungarian notation or the fact that you prefer some different dialect or metalanguage.

ajross · 2024-01-22T18:11:32 1705947092

> I know it might seem unhinged at first

Not "unhinged". Most kids these days get their first introduction to computer programming using of of the many "block coding" environments, almost all of which are straightforward recapitulations of Javascript under the hood. And it works, and it avoids the problem of having to teach them how to deal with syntax errors before you teach them imperative logic.

The reason people don't do this is that it's just a bad idea. The fact that all source code is stored in a universally understood data format with pervasive support across decades of tools is a feature and not a bug. How do you grep your AST to see if it's using some old API that needs to be refactored? Surely you'll answer that you use your fancy AST grep tool, which is not grep, and thus works differently for every environment. Basically every environment now has to have its own special editor, grep, diff, merge, etc... Even things like documentation generation and source control rely on files being text. And you're throwing all that out just to be different.

Also, FWIW: it's optimizing the wrong part of the problem anyway. The total cost to an organization that develops and deploys software of any form is overwhelmingly dominated by tasks like debugging and documentation and integration. The time spent actually typing correctly-formatted text into your editor is a vanishingly small fraction of software development, and really that's all this helps.

iveqy · 2024-01-23T14:28:30 1706020110

PlasticSCM has semantic merge that does something like that: https://docs.plasticscm.com/semanticmerge/intro-guide/semant...

gsuuon · 2024-01-23T03:32:42 1705980762

Not (just) a VCS, but this is the idea behind the Unison language: https://www.unison-lang.org/docs/the-big-idea/

ironmagma · 2024-01-23T03:04:36 1705979076

Considering many languages' very own out-of-the-box tooling (e.g. gofmt, syn) often have glaring gaps[1][2] in the understanding/roundtripping of the language's AST constructs, I would never be able to trust something like this to store and restore my code.

[1] https://github.com/golang/go/issues/20744

[2] https://github.com/dtolnay/syn/issues/782

e12e · 2024-01-22T18:53:28 1705949608

I believe the smalltalk vcs Monticello work on a semantic level?

https://eng.libretexts.org/Bookshelves/Computer_Science/Prog...

nolist_policy · 2024-01-22T21:53:58 1705960438

You can do most of this in git via custom diff-driver and smudge/clean filters.

For example git can already convert line-endings on the fly for windows. This is special-cased, but can just as well be implemented via smudge/clean.

Oh and git-lfs is done via smudge/clean too.

ironmagma · 2024-01-23T03:12:02 1705979522

One problem with that though is that smudge and clean are not used in rename detection. Git purposely skips running these filters to detect renames for performance. There are quite a lot of other issues with smudge/clean too though.

distortedsignal · 2024-01-22T17:49:52 1705945792

I was thinking about trying this out, but there are some reasons why I don't think it's feasible.

Where are your comments stored?

What happens when you need to run out in the middle of a fire and you don't have time to make your code compile-able? How do you commit "un-compile-able" changes?

I think there are some really compelling reasons to try AST-checkin - all your loops can now be changed to functional, dialect changes like you mention, etc. - but there are some pretty significant downsides as well.

lowbloodsugar · 2024-01-22T18:41:01 1705948861

Nodes in the AST for comments, block comments and "raw text I don't understand" seems like a way to go?

distortedsignal · 2024-01-22T20:59:18 1705957158

Honestly yeah. Might have to give this another go.

n42 · 2024-01-22T17:59:34 1705946374

these are both already solved issues that IDEs deal with with red-green trees

Kharacternyk · 2024-01-22T17:53:52 1705946032

This would enable some advanced merge conflict resolution strategies, I suppose. However, it can also be done by building the ASTs on demand and still storing plain text.

mcdonje · 2024-01-23T05:52:43 1705989163

It would be cool to integrate Tree Sitter into a VCS. It'd be more flexible if that were an option for a project/folder/file, but also offer a text diff option for readmes/docs or for if someone is using the VCS to write a book or something.

throwaway69123 · 2024-01-23T01:20:30 1705972830

It would also alow the file structure to be relevant to source control, users could customize how the methods in a class are organised.

a-dub · 2024-01-23T03:36:35 1705980995

there's some machine from the 70s that does this. iirc it stores all source code in an ast like representation alongside binaries and has some kind of built in version control.

wish i could remember the name...

a-dub · 2024-01-23T04:44:43 1705985083

ahh yes, the rational r1000. an ada machine from the 70s that stored programs in a mixed ast/object data format called diana: https://insights.sei.cmu.edu/documents/948/1988_005_001_1565...

psantosl · 2024-01-23T06:03:38 1705989818

Plastic SCM developed Semantic Merge and diffing about a decade ago

rajeevk · 2024-01-22T12:37:54 1705927074

I have not analyzed the full potentials and benefits of Diversion but I would not agree with the statements you made about the Git. I think you should not focus on Git in your pitch.

>>it was built for a very different world in 2005 (slow networks, much smaller projects, no cloud)

Slow network: why is this a negative thing? If something is designed for a slow network then it should perform well in a fast network.

Mush small project: I do not agree. I can say that it was not designed for very very large projects initially. But many improvements were made later. When Micorosoft adopted Git for Windows, they faced this problem and solved it. Please look at this https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...

No cloud: Again I would not agree. Git is distributed so should work perfectly for the cloud. I am not able to understand what is the issue of Git in the cloud environment.

>>In our previous startup, a data scientist accidentally destroyed a month’s work of his team by using the wrong Git command

This is mostly a configuration issue. I guess this was done by a force push command. IFAIK, you can disable force push by configuration.

jsnell · 2024-01-22T12:57:22 1705928242

> Slow network: why is this a negative thing? If something is designed for a slow network then it should perform well in a fast network.

Designing for resource-constrained systems usually means you're making tradeoffs. If the resource constraint is removed, you're no longer getting the benefit of that tradeoff but are paying the costs.

For example, TCP was designed for slow and unreliable networks. When networks got faster, the design decisions that made sense for slow networks (e.g. 32 bit sequence numbers, 16 bit window sizes) became untenable, and they had to spend effort on retrofitting the protocol to work around these restrictions (TCP timestamps, window scaling).

fourside · 2024-01-22T18:26:28 1705947988

That makes sense but then the pitch should include something about how back in 2005 the design for git had to make a trade off because of X limitation, but now that restriction isn’t applicable which enables features A and B. I don’t really see what trade offs a faster network enables other than making it a requirement that you have a network connection to do work (commits are a REST call). I’m not sure that’s a trade off I’d want in my VCS, but maybe I’m just not the target audience for this.

funcDropShadow · 2024-01-22T12:53:12 1705927992

Even a force push doesn't destroy the reflog or runs the GC server-side. I wonder how you can accidentally loose data with Git. I've seen lot's of people not being able to find it, but really destroying it is hard.

sasham · 2024-01-22T13:05:28 1705928728

He force pushed a diverged branch or something like that, and we only found out after a while. We were eventually able to recover because someone didn't pull. But it was not a fun experience :D

bauruine · 2024-01-22T13:20:18 1705929618

So multiple people did a git reset --hard origin/master and nobody complained or checked what and why this was done? That's not "one data scientist with the wrong command" but the whole team that fucked up hard IMHO.

swells34 · 2024-01-22T14:36:17 1705934177

I think you just sold their pitch with this comment... I, like many many people here, have done quite a bit of product design. What do you call it when a bunch of people use your product, and it breaks for several of them? That generally indicates your product is weak, or has a very rough UI.

MrDarcy · 2024-01-22T15:10:48 1705936248

The pitch simply wasn’t true. Data was not destroyed and was restored hours later.

lowbloodsugar · 2024-01-22T16:56:51 1705942611

For many of us, the story rings true. We have ourselves had horror stories that we did manage to recover from after a few hours of fearfully googling, and we know of other, less capable friends and colleagues who were unable to recover the data and who just accepted the loss.

TexasMick · 2024-01-22T19:52:17 1705953137

It's kinda crazy argument, I think data loss is way more likely with a centralised system than a decentralised system.

lowbloodsugar · 2024-01-23T04:19:28 1705983568

You think Microsoft losing GitHub repos is more likely than poor bastards trying to make sense of the git command line? You think these guys are going to do a worse job with their centralized service?

TexasMick · 2024-01-23T08:18:03 1705997883

People have lost data on GitHub from repositories being copyright striked for example.

At least with git, every developer has a copy of the full history so full data loss is impossible really. What happens if this company folds? You're left with some proprietary repo that you suddenly have to workout how to self host.

It just doesn't make sense when compared to just learning git which is definitely the most fruitful thing a developer could learn at the start of their career.

swells34 · 2024-01-22T15:41:35 1705938095

It's a pitch. The story has obviously been embellished and polished and condensed, ready public consumption. Being pedantic against it is not productive.

MrDarcy · 2024-01-22T16:54:13 1705942453

Politely disagree. It’s productive because hopefully future teams who launch on HN ask each other, “Is what we’re saying true?” during all those polishing and condensing sessions. If they don’t, the risk is crossing a line that damages the reputation of the team and undermines months if not years of hard work.

spiderice · 2024-01-22T16:37:25 1705941445

That's a creative way to defend a dishonest pitch

yjftsjthsd-h · 2024-01-22T16:02:48 1705939368

If the pitch is dishonest, why would I ever trust them with something as vital as my VCS? (And yes, "embellished" means dishonest)

water-your-self · 2024-01-22T18:21:59 1705947719

This is not a pedantic criticism.

Wytwwww · 2024-01-22T21:13:42 1705958022

But that seems like pretty much the equivalent to "rm -R *"? And also just a permission/configuration issue.

sasham · 2024-01-22T22:40:36 1705963236

To put into perspective, that was in 2014 :D There were no branch protections, and git was even harder to use. Plus everyone was new at git, obviously (we started in 2013 with mercurial, which was still a legit thing to do, and switched to git).

mlhpdx · 2024-01-23T01:00:01 1705971601

Yeah, these days stopping force pushes is a checkbox (default?) in GitHub.

bauruine · 2024-01-22T22:28:14 1705962494

Or drop table|database or delete from. To _nearly_ lose data it took multiple clueless engineers and not detecting the issue for months.

I wonder how Diversion handles operations that possibly delete data. Whats their solution?

lonelyasacloud · 2024-01-22T13:55:51 1705931751

> but the whole team that fucked up hard IMHO.

Multiple individuals with similar problems would tend to imply systematic inadequate training. Or the enterprise concerned adopting an inappropriately complex system for its intended userbase.

willy_k · 2024-01-22T14:57:16 1705935436

Or, git is both very complex and very useful, and a large portion of its users have a poor understanding of git but enough for it to be a useful tool. If you want to do source control (which you do), then you’re investing time into learning git and/or fixing git, or maybe using a project like this.

keerthiko · 2024-01-22T16:15:06 1705940106

You literally just said what GP said in different words, but prefaced it with "Or" as if it's a disagreement. What you said boils down to "inadequate training".

willy_k · 2024-01-22T16:33:51 1705941231

We both agree that they didn’t know the tool, but GP seems to blame them for deciding to use the tool without training. I was more or less defending their choice to use git, while also acknowledging the potential of a tool like Diversion. My interpretation of GP was that it doubled down on git, while claiming that anyone using git without understanding it is “doing it wrong”, which I agree with in principle but not in practice, as I argued in my initial comment.

NateEag · 2024-01-22T13:54:34 1705931674

And the tool made a screwup that hard not only possible, but very difficult for the victims to recover from.

Doesn't say a lot for git's usability.

JeremyNT · 2024-01-22T15:15:46 1705936546

A couple of thoughts about this:

One is that the possibility of overwriting history / etc is a really powerful and useful feature, but one that should only be used with some consideration, hence being gated behind the scary '--force'. The fact that git provides one the ability to discard and overwrite commits for a ref shouldn't be an endorsement of doing so freely. I'm glad git has this capability though and any "git alternative" would be all the worse if it didn't provide it, IMO.

Two is that if the concern is git's usability - i.e. the "problem" here is that it's too "easy" for users to do destructive actions accidentally - well, there are ways to solve that other than to reinvent all of git. There are plenty of alternative git UIs already, and an alternative UI is a great way to be "wire compatible" for existing users but still help protect those novice users from footguns.

NateEag · 2024-01-22T18:16:20 1705947380

That all makes sense and mirrors many of my own thoughts.

Though I'll say that "--force" isn't necessarily a "scary-sounding" option name unless you're used to Unix CLI naming conventions.

Further, the warnings git gives you about this are virtually inscrutable if you don't already understand what's happening.

A good interface to "blowing away history" would give you a brief summary of what will actually be gone, e.g.:

"If you go ahead with this overwrite, the following changes will be completely removed from the repo:

a3bf45: Fix bug in arg parsing 22ec04: Add data from 2024-01-17 scraper run ...

Are you SURE you want to completely destroy those commits? (Y/n)"

and if user says "Y", output should log all removed commits and also say:

"These commits can still be recovered until <date>. If you realize you want these back before then, run the following:

<command to restore commits>"

Generally, I think it's a mistake to put UI improvements in a secondary tool.

If there are issues that need fixing, get those changes in the canonical project, because layered patches on top will always be short of maintainers and behind the main project.

xorcist · 2024-01-22T20:35:28 1705955728

> Are you SURE you want to completely destroy those commits? (Y/n)

While there is a lot of user interfaces that could be improved, I believe the above have empirically been shown to be inferior to the alternative "re-run this command but add scary option to proceed".

Users habitually answer "Y" to questions like the above all the time. And certainly after a few times it becomes routine for anyone. But having to re-enter the command and type some a whole word like "overwrite", "force" or "i-know-what-im-doing" is a whole other roadblock. The example is especially ill-chosen to have Y as the default option.

Any operation in git that destroys so many commits will include a list of commits that is destroyed, similar to what is suggested here, and trying to push the resulting repository will say exactly how many commits will be removed, and require rerun with force option (together with the necessary privileges). So reality is already not far from what you suggest, but with more fail safes.

NateEag · 2024-01-23T16:52:29 1706028749

You make a good point about "Y/n" being more dangerous than refusing and requiring an explicit option be passed.

The clear warning about what commits will be lost is not at all how I remember force-push working.

That said, I usually use magit in emacs for git and understand the force options well, so I haven't actually looked at the standard push failure warning in years. Maybe I'm remembering wrong, or perhaps it's been improved in recent versions.

rusk · 2024-01-22T15:40:07 1705938007

It doesn’t overwrite the commute though. It inserts new ones and resets the branch pointer … doesn’t seem like you’d need a whole new tool to mitigate this - just an automatically generated tag or something when you —force-push - would be easy to do if there was demand for it …

funcDropShadow · 2024-01-22T15:12:46 1705936366

They used --force which is usually the flag to say: Here there be dragon. Be careful.

mbreese · 2024-01-22T18:10:28 1705947028

Yeah, I can’t see how use of a —-force flag by people who didn’t know what they were doing is enough of a reason to switch to a different VCS (let alone write one). The issue was people using a tool in a way that they shouldn’t have. Which isn’t a technical problem, but a training problem. You can’t fix people problems with technology, so I’m sure there will be other footguns in this new system that someone else will figure out how to almost lose data.

Git is great in that it is flexible and powerful. But that power leaves some tools open to people who don’t know what they are doing… that’s the trade off.

(Now something that better handles non-code assets and large data files, I’d be much more willing to listen to that pitch.)

umanwizard · 2024-01-22T13:10:07 1705929007

So the work wasn’t actually destroyed, and you were able to recover it. So all the people pointing out how implausible that part of your pitch was were right, and you were in fact just lying.

sasham · 2024-01-22T13:29:44 1705930184

That's really not the main point of the post, but you're right I should have been more precise.

Edit: updated in the top text now!

ysavir · 2024-01-22T13:47:08 1705931228

I think the point being made is that you spent a lot of your opening post talking Git, and lead with that bit, rather than with Diversion. What makes Diversion different is added in the end, after you've spent time trying to convince Git fans that their current tooling isn't good. Worse, the examples you listed of why Git is bad is more reflective of configuration and processes than Git itself.

This is ultimately a very weak pitching strategy. The first thing you convey to your potential users is insecurity--an insecurity that people won't choose your product over Git. And it's hard to want to buy something from someone that isn't secure enough about their product to pitch the product first, and answer questions/make comparisons after, as a form of clarification.

Alternatively, instead of doing a comparison to Git, you could start with a list of "have you experienced these Git issues? <list of problems>. Here's how Diversion improves on Git in this regard." In this case you're actually solving people's problems, rather than looking like you're grasping at straws to complain about Git and justify an alternative.

FWIW, I personally have 0 interest in a cloud-first version control. I like the cloud as a form of backup and syncing with team members, but I ultimately want a version control that works as well offline as it does online, and prioritizes the local experience.

umanwizard · 2024-01-22T13:55:25 1705931725

The main point of your post is how much better you are than git. You support this main point by making up lies about git. This does not make me personally interested in trying your product.

IlliOnato · 2024-01-22T16:43:06 1705941786

From my point of view, it's not that much about lying, for me the OP demonstrates a degree of incompetence of the post writers about Git.

The fact that they don't seem to fully understand working of Git (not on the level of Git developers, just the level of Git administrators/users) does not inspire trust in their competence to create a Git alternative.

renewiltord · 2024-01-22T16:59:56 1705942796

Just somewhat surprised because if anyone did a `git pull` they'd get divergent history and therefore a merge on default configuration. It would take a lot of manual work to ruin more than one copy of the repo.

sbergot · 2024-01-22T13:56:29 1705931789

For your information you can use the reflog command to find the previous head commit and restore your branch. It takes 10 minutes and then you learn to disable force pushing on the main branch.

swells34 · 2024-01-22T14:39:07 1705934347

I find it funny how many comments in this angry rebuttal section actually endorse a Git replacement.

plagiarist · 2024-01-22T14:50:14 1705935014

It's an interesting new application of that joke, "when I have a question on Linux I use a sock puppet account to leave an obviously wrong answer which prompts dozens of corrections."

I'm trying to imagine how to generalize this to other products. I think if I state the competing product has negative feature X, but also intentionally get some details confidently incorrect or deliberately feign incompetence, you get a group of people confirming X.

spiderice · 2024-01-22T16:43:36 1705941816

I find it funny how many comments you've made in this thread missing the point. People are reacting against the dishonest pitch, not the product.

lawgimenez · 2024-01-22T13:27:49 1705930069

So you were able to recover and did not lost a months work of data? Your story just doesn’t make sense. Come on.

egaldv · 2024-01-22T14:04:24 1705932264

Indeed you're right the work that was erased from BitBucket was restored from one of the employees that didn't yet pull, the post was edited accordingly.

umanwizard · 2024-01-22T14:29:46 1705933786

Wouldn’t you still have been able to recover it even if everyone did pull, assuming GC had not run on everyone’s machine?

sampo · 2024-01-22T22:14:58 1705961698

> the work that was erased from BitBucket was restored from one of the employees that didn't yet pull

Actually those commits that you considered lost, were still stored on everyone's personal computer in your team. You just didn't know how to use `git reflog` to find them.

grumbel · 2024-01-22T15:55:08 1705938908

> doesn't destroy the reflog or runs the GC server-side.

Git doesn't give you access to the server side reflog either. So it's of not much use if you don't control the server.

As for losing data with Git, the easiest way to accomplish that is with data that hasn't been committed yet, a simple `git checkout` or `git reset --hard` can wipe out all your changes and even reflog won't keep record of that.

xorcist · 2024-01-22T16:58:49 1705942729

That data not committed to git can not be recovered by git should hopefully not surprise anyone.

Neither is it the fault of your version control system, or any other system really, if you cannot access your server and are without backups.

bvrmn · 2024-01-22T16:18:39 1705940319

> As for losing data with Git, the easiest way to accomplish that is with data that hasn't been committed yet

Also Git has pretty awful behavior losing changes when one doesn't press "Save" in their IDE. Bad, bad Git.

Spivak · 2024-01-22T16:55:16 1705942516

Your applications also shouldn't lose work when you don't press save, this is the entire impetus for the "recover unsaved work" in most document editors. A version of Git that shunted uncommitted changes to a special named stash whenever you did anything destructive would be a positive thing.

It's what I end up doing manually anyway but why make a system where the default behavior is destructive and I have to remember every.

xorcist · 2024-01-22T17:10:23 1705943423

It may be prudent to note that git by default is rather kind in that way that it will not change your data unless you explicitly force it to with --force or --hard. I think git, as hard to learn as it can be, sometimes have a bit of an unfair reputation here. It's not all bad.

Not only is it quite careful about not losing data, someone actually took the time to make it spit out messages that not only describes what just happened, but also gives suggestions of what to do next depending on how the user wants to proceed. That adds a level of discoverability that is usually associated with dialog based guis. The quality of these messages can sometimes be surprisingly good, far from the Clippy-level helpfulness you sometimes see.

There are a few exceptions to the principle of not losing local changes, where you explicitly restore an old version of a file for example. But saying the default behaviour is destructive really gives a false impression.

But yes, you are absolutely right that a system to recover unsaved work is a good thing, but I would argue that it belongs at the editor level, not in a version control system. A user could have a number of files open that have local changes. The editor has a much better idea in which order changes were made, and which changes hasn't even been committed to disk yet.

whartung · 2024-01-22T19:57:42 1705953462

I can't say I'm widely traveled, I have no idea how desktop Office works, but Apple does this so well.

Using their desktop apps, Pages, Keynote, Numbers, TextEdit, Preview, I never hit "Save". I just close the apps. When I come back, the windows reopen right where I left off.

I wish emacs did this. I honestly don't know what it would be like for a code editor to be "constantly saving". I guess I would adapt, but there are times when I do all sorts of changes and go "Ah, this isn't right" and just kill the buffer. The ultimate undo.

But there's a great feeling, to me, when I go to close the app (or shutdown the computer) and it just closes. No prompts, no warnings, just saves its state, shuts down, and comes back later. And with the ever popular "naming things" issue of computers, I have a bunch of just "Untitled" windows. They're there when I open the app, and that's all I need to know.

The nag factor and cognitive load reduction of that is just unmatched. "Just deal with it, I'll come back later, maybe, and clean it up". One less thing.

Brian_K_White · 2024-01-22T16:26:47 1705940807

A month of work for a whole team was never even committed or stashed let alone pushed? That is not a git problem.

noufalibrahim · 2024-01-22T15:48:44 1705938524

I agree. It's quite hard to actually destroy data in git. Even with the so called "destructive" commands, walking through the reflogs can usually restore work that was accidentally deleted or whatever.

billpg · 2024-01-22T15:39:54 1705937994

I configured my github to only allow commits with an anonymised email address. Time passed and I used another machine on which I had already opened that repo before. I pulled my recent work successfully, wrote stuff and then committed and pushed.

Github rejected my commit as I had the wrong email address. I then had to try and work out how I delete a commit but keep all my changes so I could commit it all again but with the correct email address.

I'm not sure exactly what I did but in my ham-fisted experimentation I deleted the commit and restored my local copy back to the way it was before my commit, losing all my work that day.

BossingAround · 2024-01-22T15:47:40 1705938460

If you had already committed, `git reflog` should have still found your changes (even after you deleted the commit and restored the local working tree) unless you deleted and re-cloned the repository.

fesc · 2024-01-22T17:27:50 1705944470

Honestly I don’t understand why not more people use a GUI for git.

What you describe would be 1 Minute of work and maybe 10 clicks with a very low probability of shooting yourself in the foot in Tower.

cqqxo4zV46cp · 2024-01-22T13:01:06 1705928466

Destroying it and nobody knowing how to recover, or that it can be recovered at all, it are identical.

sasham · 2024-01-22T13:01:04 1705928464

Thanks! We're definitely not trying to bash Git, it's done a lot of good for software development and for sure is going to continue evolving.

Git had much more edge when it was competing vs SVN and other centralized VCSs. With 10Mb networks (if you were in office) you could feel physical pain when committing stuff ><

Reg how Git is not perfect in the cloud world - check out GitHub's blog post here about their cloud dev environment, Codespaces https://github.blog/2021-08-11-githubs-engineering-team-move...

"The GitHub.com repository is almost 13 GB on disk; simply cloning the repository takes 20 minutes."

Moving 13GB inside your own cloud should take seconds at most. The problem is the way Git works, it clones your entire repository into the container with your cloud environment, using a slow network protocol. With Diversion it takes a few seconds.

andsoitis · 2024-01-22T15:14:10 1705936450

> Thanks! We're definitely not trying to bash Git, it's done a lot of good for software development and for sure is going to continue evolving.

It is not about bashing git; it is about anchoring your argument of why Diversion is a better alternative around git. You're basically taking your game/arguments to their playing field, and thus will have an uphill battle for mindshre.

Instead, consider reframing the playing field and mention git less (if at all). Something like "the future of version control is blah". Surprise us, talk to us about your vision for source control, or better yet, code and multi-discipline collaboration (e.g. between eng and design), etc.

bruh2 · 2024-01-22T15:38:06 1705937886

I personally would not bother reading any "the future of X" if it did not address problems of existing tools. I know you're trying to give advice from a marketing pov, and it is good, but it's also inherently bulshitty – because its purpose is to net more sales rather than actually make a good argument

asimpletune · 2024-01-22T16:03:36 1705939416

I'm not sure I understand this at all.

> The problem is the way Git works, it clones your entire repository into the container with your cloud environment, using a slow network protocol.

What about git's network protocol is 'slow'?

I think I can also come up with a pretty simple experiment to prove or disprove this: 1. Fill a file with 13Gb of data and commit it. 2. Upload that to GitHub or wherever you want 3. Time how long it takes to clone and compare that to the real GitHub.com

You will find the one we made takes 'seconds' (or minutes, depending on your network connection), while the the GitHub.com will take some time.

So, same data, two different results? The difference in this experiment rules out the 'slow' network protocol as the difference maker. The real reason is that the GitHub.com repo will have hundreds or thousands of commits.

Basically, the difference is the commit history, because that's how git needs to work. Git stores the diffs for the entire commit history, not just the literal files at the HEAD. I don't know what the network protocol has to do with that.

yjftsjthsd-h · 2024-01-22T19:04:07 1705950247

It is perhaps worth pointing out that if you don't need the history you can just `git clone --depth 1` and save the network transfer and disk space.

TexasMick · 2024-01-22T19:55:30 1705953330

It reminds of when someone told me git submodules are slow.

They just forgot about shallow clones..

nolist_policy · 2024-01-22T19:30:10 1705951810

If you use the dumb http protocol, both cases should be equally fast.

asimpletune · 2024-01-22T21:18:06 1705958286

git clone https://github.com/github/docs.git 123.57s user 37.02s system 74% cpu 3:35.73 total

git clone --depth 1 https://github.com/github/docs.git 3.37s user 1.83s system 35% cpu 14.521 total

Not a scientific test at all, but the second one was literally 15x faster, wall clock time.

dartos · 2024-01-22T13:15:28 1705929328

> We're definitely not trying to bash Git

Using git with bash is the best way to use git (:

gnarlouse · 2024-01-22T14:51:43 1705935103

Came here to make a similar joke

funcDropShadow · 2024-01-22T16:44:41 1705941881

That article also states that using a standard Git feature, shallow clones, you go from 20min to 90s. Most of the problems touched upon in the article are about state management for local environments, yes that can be tricky. And it can take time, but it has nothing to do with Git.

vintagedave · 2024-01-22T12:57:54 1705928274

>> a data scientist accidentally destroyed a month’s work of his team

> This is mostly a configuration issue

git apologism :)

(FWIW I do agree with the rest of your comment, and I hope you forgive the slight joke. Product users, for any product are fallible humans. That might be fallible in accidentally deleting, or it might be fallible in forgetting to turn on the safety settings.)

Very seriously, something like this should not be possible in a source control system. Data integrity needs to be built in by design.

MatthiasPortzel · 2024-01-22T15:55:43 1705938943

> Data integrity needs to be built in by design

It is built into Git by design. Git keeps commits around for 90 days even after they’re “deleted.” This is why people who understand Git were so skeptical of OP’s claim. The point that Git is confusing still stands, however.

devjab · 2024-01-22T14:02:48 1705932168

The issue with a lot of freedom and unopinionated tools is always going to be the multitude of ways to fuck up. On the flip-side, you may not like what choices are made if you’re forced to use it in a certain way.

We enforce a strict pull-request squish commit with four eyes approval only. You can’t force push, you can’t rebase, you can’t not squish or whatever else you’d want to do. But we don’t pretend that is the “correct” way to use Git, we think it is, but who are we to tell you how to do you?

We take a similar approach to how we use Typescript. We have our own library of coding “grammar?” that you have to follow if you want to commit TS into our pipelines. Again, we have a certain way to do things and you have to follow them, but these ways might not work for anyone else, and we do sometimes alter them a little if there is a good reason to do so.

I don’t personally mind strict and opinionated software. I too think Git has far too many ways to fuck up, and that is far too easy to create a terrible work environment with JavaScript. It also takes a lot of initial effort to set rules up to make sure everyone works the same way. But again, what if the greater community decided that rebase was better than squash commit? Then we wouldn’t like Git, and I’m sure the rebase crowd feels the same way. The result would likely leave us with two Gits.

Though I guess with initiatives like the launch here, is two Gits. So… well.

oblio · 2024-01-22T14:45:52 1705934752

> But again, what if the greater community decided that rebase was better than squash commit? Then we wouldn’t like Git, and I’m sure the rebase crowd feels the same way. The result would likely leave us with two Gits.

Meh, this is overrated. We'd end up with 2 Gits, and over time just one fork would probably take over, based on marketing, PR, dev team activity, etc. The second one would probably still be around but used by only a minor part of the community.

Just because a thing has on paper many forks, does not mean those forks are equal. In fact, a situation with many major forks rarely survives the long term. See Jenkins vs Hudson, Firefox vs Iceweasel, etc. Most people will congregate towards one of the forks and that's it.

dmazzoni · 2024-01-22T17:54:59 1705946099

What if someone pushes something inappropriate? Shouldn't there be a way to delete it?

As an example, what if someone pushes:

- A private key or password - Copyrighted content - Illegal content

In cases like this, it needs to be possible to remove the bad commit from the repository entirely.

layer8 · 2024-01-22T22:25:59 1705962359

Yes, but this should be only possible by way of commands that make it abundantly clear what you are doing, e.g. `git delete <whatever>` with extra confirmation “Do you really want to permanently and irrevocably delete <whatever> in the master repository?”, or a more obvious “recycle bin” that presents deleted branches/commits in familiar ways and with explicit expiration dates. But the Git architecture doesn’t lend itself to that level of user-friendlyness.

IshKebab · 2024-01-22T13:22:29 1705929749

> When Micorosoft adopted Git for Windows, they faced this problem and solved it.

On Windows. On Linux Git still doesn't scale well to very large repos. Before you say "but Linux uses git!", we're talking repos that are much bugger than Linux.

Also the de facto large file "solution" is LFS, which is another half baked idea that doesn't really do the job.

You sound like you're offended that Git isn't perfect because you like it so much. But OP is 100% right here; these are things that Git doesn't do well. It's ok to really like something that isn't perfect. You don't have to defend flaws that it clearly has.

WorldMaker · 2024-01-22T16:21:59 1705940519

>> When Micorosoft adopted Git for Windows, they faced this problem and solved it.

> On Windows. On Linux Git still doesn't scale well to very large repos.

All of Microsoft's solutions for git scaling have been cross-platform. Even VFS had a FUSE driver if you wanted it, but VFS is no longer Microsoft's recommended solution either, having moved on to things like sparse "cone" checkouts and commit-graphs, almost all of which is in mainline git today.

I also find it funny the complaint that git scales worse on Linux than Windows given how many Windows developers I know with file operation speed complaints on Windows that Linux doesn't have (and is a big reason to move to Windows Dev Drive given the chance, because somewhat Linux-like file performance).

IshKebab · 2024-01-22T17:10:50 1705943450

`fsmonitor` is still only available for Mac and Windows.

https://git-scm.com/docs/git-config#Documentation/git-config...

WorldMaker · 2024-01-22T18:42:18 1705948938

Fair enough, though there is a hook to provide your own on Linux: https://git-scm.com/docs/githooks#_fsmonitor_watchman

graemep · 2024-01-22T15:36:39 1705937799

How common are repos bigger than Linux?

Linux also has the huge advantage of an ecosystem, tools and integrations. It is overkill for small projects and there are friendlier alternatives for those - but git wins because it is what everyone knows. Something aimed at the small number of large projects will suffer the same problem.

gertop · 2024-01-22T15:58:07 1705939087

> How common are repos bigger than Linux?

In terms of number of commits, Linux is probably bigger than most. In terms of storage size, almost any video game project will be significantly bigger.

It's no secret that git is very bad at handling large binary files.

graemep · 2024-01-22T18:33:31 1705948411

So this is very specifically for things like games with large binary assets?

IshKebab · 2024-01-22T21:59:48 1705960788

No, large companies using monorepos will have repos much bigger than Linux even without large binary assets. Apparently Linux has ~10 commits per hour. I probably do ~10 commits per week. So a team of ~150 mes produces commits at a fast rate than Linux. Very rough estimate but it takes less than you'd think.

Also if you vendor a few dependencies that quickly increases the size.

Spivak · 2024-01-22T17:02:57 1705942977

You don't even need game assets, your company's icon library is likely enough to tip the scales into territory git doesn't handle well.

Wytwwww · 2024-01-22T21:35:28 1705959328

> really like something that isn't perfect. You don't have to defend flaws that it clearly has.

Certainly true. But it's not clear at all how does the product solve these specific problems (they say "Painless Scalability" which sounds nice but did they try developing any 100+ GB projects with massive numbers of commits/branches on it?)

Rygian · 2024-01-22T12:54:09 1705928049

> This is mostly a configuration issue. I guess this was done by a force push command. IFAIK, you can disable force push by configuration.

If a feature can lead to actual unintended data loss, it should come disabled by default. Are there any other "unsafe by default" features in Git? What would be a sane general default that prevents unwanted data loss, and why is it the case?

guax · 2024-01-22T13:00:05 1705928405

--force always imply data loss. You're overriding the remote state.

Do people use it in an unsafe manner because they don't understand git and there lies a problem that could be tackled? yes.

With that, I don't think git has any feature that is unsafe by default.

sasham · 2024-01-22T13:07:48 1705928868

In that specific case there was some error that the user didn't understand, he googled and found a StackOverflow answer with --force. And naturally tried it BitBucket didn't have branch protection back then, today it's a bit better (you can still destroy your work but usually not others')

Einenlum · 2024-01-22T16:57:12 1705942632

I agree that git is very complex (just try reading its documentation and how many options or commands you have never heard of before). But I think push --force is probably one of the easiest git concepts to get. The fact that someone in your team copy pasted something from SO without understanding it doesn't seem to be related to git. Otherwise we could say that the fact some people lose their data through "sudo rm -rf /" proves the complexity of Unix. I don't think so.

nullstyle · 2024-01-22T16:34:18 1705941258

This was Pebcac my dude. git wasn’t at fault here, the script kiddy that pastes before understanding is the fault. Amateurs

CJefferson · 2024-01-22T14:10:29 1705932629

My biggest problem with git is branch deletion — if you never do it you end up with far too many, but deleting a branch can’t be version controlled.

renewiltord · 2024-01-22T17:04:02 1705943042

It is somewhat version-controlled but not completely. If you use the reflog you can find it again and you can find how it moved around. But the reflog gets rewritten and gc'd so it's not true vc.

AdityaSanthosh · 2024-01-22T15:03:31 1705935811

Just curious, why do you want that to be version controlled?

CJefferson · 2024-01-22T15:14:52 1705936492

Because I might realize later I made a mistake, or I might want to view history.

I’d I never cared about historical state and mistakes, I wouldn’t need version control at all :)

couchand · 2024-01-22T15:42:52 1705938172

You could delete the branches locally while archiving them to any another clone of the repo.

IshKebab · 2024-01-22T13:19:05 1705929545

> With that, I don't think git has any feature that is unsafe by default.

Well, you just mentioned `--force`. It is unsafe by default. Git has a couple of flags to make it safer (`--force-with-lease`, `--force-if-includes`) but those aren't the default.

withinboredom · 2024-01-22T17:02:12 1705942932

If you’ve ever had to remove private information from history before making the repos public (think domains, names, configuration, etc) you will appreciate the ability to rewrite history (and all the other things --force gives you)

IshKebab · 2024-01-22T22:02:33 1705960953

I don't get your point. Nobody is saying don't use `--force`. Just that the default `--force` flag is the most dangerous variant.

withinboredom · 2024-01-23T07:08:54 1705993734

I am not aware of any default use of force. Where does that happen?

kristjansson · 2024-01-22T20:09:40 1705954180

The feature is 'git push'. --force is the opt-in to the unsafe behavior. It should not be used lightly.

IshKebab · 2024-01-22T22:01:36 1705960896

You're missing the point. `--force` is the default of the force variants. The other `--force-but-something` arguments clearly modify that default. It's the wrong way round.

Obviously they've done it for backwards compatibility, but the fact that they haven't even added an option to make it the default is pretty lame.

couchand · 2024-01-22T13:09:49 1705928989

Should a chain saw come with the ability to start the engine disabled by default?

IshKebab · 2024-01-22T13:15:42 1705929342

Yes. That is a great idea. You could do something like a tab that you have to remove that tells you about chainsaw safety.

couchand · 2024-01-22T13:22:26 1705929746

The problem here is not the tool. The problem is the author's colleague's willingness to paste a stackoverflow answer into their terminal without taking a moment to understand what it does.

If stackoverflow told them to break off the chainsaw safety tab there is no chance it would have been read first.

dmazzoni · 2024-01-22T17:58:36 1705946316

But it doesn't lead to data loss.

The commits that were overwritten by "force" are still there on the server. Any admin could recover them pretty easily. They're probably still present in the local repo of the person who ran "git push --force" too, as well as anyone else's machine who has cloned the repo.

The only way you'd actually lose data is if every single person who had a clone of the repo ran gc.

Or apparently if nobody knew about "git reflog" and nobody bothered to do a Google search for "oops I accidentally force pushed in git" to learn how to fix it.

aseipp · 2024-01-22T17:47:32 1705945652

The Windows Git repository is only 300GB, that's basically childs' play when people are talking about "large repo scalability". Average game developer projects will be multiple terabytes per branch, with a very high number of extremely large files, and very large histories on top of it. Git actually still does handle large files very poorly, not only extremely large repos in aggregate. The problem with large Git repositories is nowhere near solved, I assure you.

laeri · 2024-01-22T17:50:36 1705945836

This includes assets right or some kind of prebuilt data in custom formats? Otherwise it would be hard to have this much data in source files.

aseipp · 2024-01-22T18:27:46 1705948066

Yes, game development studios include their raw art and environment assets directly in source control, just like source code. That's because the source code and the assets for the game must go together and be synchronized. That also includes things like "blueprints" or scripting logic. Doing anything else (keeping assets desynchronized or using a secondary synchronization tool) is often an exercise in madness. You want everyone using one tool; most of the artists won't be nearly as technical and training them in an entirely different set of tools is going to be hard and time consuming (especially if they fuck it up.)

But honestly, you can ignore that, because Git doesn't even handle small amounts of binary files very well. Ignore multi-gigabyte textures and meshes; just the data model doesn't really handle binary files well because e.g. packfile deltas are often useless for binaries, meaning you are practically storing an individual copy of every version of a binary file you ever commit. That 10MB PDF is 10MB you can never get rid of. You can throw a directory of PDFs and PSDs at Git and it will begin slowing down as clones get longer, working set repos get bigger, et cetera.

The 300GB size of the Windows repository is mostly a red herring, is my point. Compared to most code-only FOSS repos that are small, it's crazy large. That kind of thing is vastly over-represented here, though. Binary files deserve good version control too, at the end of the day.

Kiro · 2024-01-22T16:13:12 1705939992

Git is bad for games and they should definitely compare them in their pitch if they want to capture that market.

kfrzcode · 2024-01-22T16:17:19 1705940239

No, it's not. LFS has improved over the years. Git is supported as a first class citizen in Unreal Engine 5 - alongside P4.

Arelius · 2024-01-22T17:36:56 1705945016

Just because it has integrations, doesn't make it great. LFS is still not great. Doesn't have a lot of backends for instance. And a real locking system is table-stakes for a gamedev VCS

Kiro · 2024-01-22T16:25:33 1705940733

Good for developers using Unreal Engine 5 I guess. Fact remains that most game developers struggle with Git.

jasfi · 2024-01-22T17:02:47 1705942967

The complexity people think they face with Git can often be overcome with a good UI and/or tutorials.

sasham · 2024-01-22T17:20:46 1705944046

In part yes, e.g. lots of people like SourceTree. Some of the complexity is inherent though, e.g. local vs remote branches and the various conflicts & errors as a result. Git exists for 18 years, and yet the complexity problem wasn't solved yet. Other tools like SVN were never considered to be so hard to use / easy to screw up.

lifeofguenter · 2024-01-22T15:19:00 1705936740

Have you ever tried running Git in the cloud? :)

Cloud-native and running things on “EC2” are very different things.

sasham · 2024-01-22T16:00:46 1705939246

Yep :) Lots of products run Git on EC2/containers, e.g. GitPod or GitHub Codespaces. Ironically, Diversion works much faster on these than git

https://github.blog/2021-08-11-githubs-engineering-team-move...

parasti · 2024-01-22T12:12:53 1705925573

Focusing on Git seems like completely the wrong pitch. Git is a distributed VCS - in all your examples you were clearly trying to use Git in a centralized manner with no backups. I suggest focusing more on your own product than on Git.

sasham · 2024-01-22T12:22:04 1705926124

You're totally right, we were using BitBucket and pushing and pulling from there. It's really more of a centralized manner, but this is the usual workflow for most teams and companies and what they actually need (a single source of truth). Totally agree about backups, lesson learned :)

solidasparagus · 2024-01-22T13:41:33 1705930893

I want to add counter-feedback. I think focusing on git's weaknesses is really appealing. My background is more ML research and data and I viscerally connect with your pitch around both git's limited scalability and the complex/dangerous nature of git operations (ML researchers + git is not great).

TexasMick · 2024-01-22T19:57:54 1705953474

Tbh they should really just learn it. It's super simple and powerful

funcDropShadow · 2024-01-22T12:55:14 1705928114

Until the day, the company is target by a ransomware group and everything is switched off in panic. Or the day the network connection of a building goes down. Requiring a REST api in the cloud for every VCS is command bytes hard then. We had this before with SVN and it wasn't nice.

IlliOnato · 2024-01-22T17:34:34 1705944874

So it sounds to me as you are trying to create a replacement for BitBucket\GitHub and their ilk, not Git. This may be a worthwhile task. Maybe it makes sense to concentrate on this in your pitch.

For BitBucket\GitHub\GitLab and for workflows enabled by them Git is just an underlying technology. Some of the functionality of these services is implemented using Git commands very clumsily. Some Git commands don't make sense or are dangerous in such environment\workflow. Yet Git interface is fully exposed to the users of these systems.

(Despite your statements and example, Git commands are not dangerous in the sense that they can destroy information already pushed to the repo. However, as your example demonstrated, they are dangerous in the sense that to recover from them requires expert knowledge and capabilities.)

Git was designed to for truly distributed development, and it is great for that. A lot of projects use it though for a centralized development. Git the software is fully capable to support such development with proper configuration, but has arguably bad defaults for it, and the existing solutions seem to be half-assed (to tell the truth, I hate GitHub, but won't go into this right now).

For me in my work and personal use the fully distributed character of Git is not important, but being able to work offline is, crucially. I know it is important issue for many developers. With working from home being more and more widespread I'd think this issue becomes more important, not less. Not being tethered to your good internet connection, or being able to work during an outage is really cool :-)

(Before Git I would have 2 VCS applications installed on my work laptop, one working against central database that required internet connection, one fully local, with separate local database. Synchronizing them was a constant chore, a significant part of it I was not able to automate and had to do manually. Sill, it was worthwhile price for being able to work offline.)

jayd16 · 2024-01-22T16:15:39 1705940139

As a game dev I find the pitch unexciting.

> git is bad we're better

Honestly, a modern git lfs workflow is really smooth. I think it handles binaries fine. Show me cumbersome git feature and why this works better. You can't just tell me tools I use every day are unusable.

I think the main pain of git is if you want to put everything in a single repo. Big isn't a problem, getting just what a I need (checking out a single large model) is the problem.

From the website I have no idea if this can do partial checkouts. I assume yes but its not stated at all.

> cloud native

A lot of studios want on-prem and self-hosted private cloud support. Cloud native is touted as a feature but details are left out. That has me wondering if some things don't work when I try to host on-prem or that its an afterthought.

Can I easily host this on my own k8s cluster? Its not stated. Cloud native doesn't mean it's on the internet

Another feature that artists like is file locking. P4 has it and git lfs actually has it too. The heavier usage of P4 streams and branches makes it hard use locks effectively these days. Merging something that was locked but now isn't is sticky business...maybe you guys solved that.

> File locking across branches - coming soon!

"coming soon"... so close.

Good luck to you guys but I think the pitch needs work.

Vermeulen · 2024-01-22T17:35:37 1705944937

Git LFS is a giant hack ontop of Git. Most game devs I know moved away from it over time (back to Perforce or SVN). It might seem okay at first - but deep into a project you'll want to rearrange/rename folders and keep history logs, and discover that Git LFS doesn't actually work like normal Git and your file history wasn't kept. Only once you start dealing with issues will you find all the weird hacks Git LFS does ontop.

I'd say Git not working well for game dev isn't a pitch that Diversion needs to make, because it's already clear to most game devs.

jayd16 · 2024-01-22T18:19:46 1705947586

Its gotten a lot better over time as far as adding tooling to make big changes. Its the same ol' 'its fine if you know what you're doing and not so fine if you don't.' Personally I'd rather deal with git's issues but p4 has the a lot of built in support in engines.

Day to day I think its the industry tooling and the partial checkouts that have people pick P4, not esoteric problems you face years down the line.

sasham · 2024-01-22T16:53:55 1705942435

Thanks for the feedback, glad you asked! Partial checkouts are supported. K8s not yet, but we do run Diversion in a container for testing. Private cloud works as well, it's more a question of support manpower though - will be available for large clients. Obviously we still don't have every possible VCS feature, we're just getting started :) But we are adding features pretty quickly (e.g. conflict notifications took a few days to implement). Thanks!

stevelacy · 2024-01-22T18:59:47 1705949987

Game dev here, totally agree.

File-locking across branches excited me, support for visual diffing of uassets would be insane. The platform screenshots as a git/github alternative didn't excite much interest as Plastic and Perforce are major players.

sasham · 2024-01-22T19:23:11 1705951391

Other game devs mentioned visual diffing as well, we'll try to make this happen!

kkukshtel · 2024-01-23T01:59:16 1705975156

Also a game dev — if you want to attract devs, tell me why this beats P4 or Plastic. How's it similar or different? Plastic has a lot of the features you're bringing up as "new" and has great DX, which tells me your either didn't do your research, or aren't saying enough about how what you're doing is meaningfully different from them.

cornholio · 2024-01-22T12:13:27 1705925607

> In our previous startup, a data scientist accidentally destroyed a month’s work of his team by using the wrong Git command.

While git is indeed a usability clusterbomb and there is massive space to improve, the problem above sounds like a devops failure. Git gives you all the tools to prevent such a disaster, all you have to do not give the root password of your CI server to any data scientist.

On the topic of your startup, I would very much like a lighter learning slope, where I can introduce regular non-coders to the benefits of source control, and still have the advanced brancing/merging/rebasing etc. for the wizards.

sgarland · 2024-01-22T12:35:28 1705926928

I also always wonder in cases like this if the person just didn’t know about reflog.

You _really_ have to try to fuck up hard enough that everything is gone, it just might require even more arcane commands than what got you into a mess.

faitswulff · 2024-01-22T13:14:58 1705929298

I've used reflog many times, but I'm not sure if this:

> You _really_ have to try to fuck up hard enough that everything is gone

...is still true when using git-lfs, which seems common when using large data sets.

sbergot · 2024-01-22T14:00:21 1705932021

I believe it works the same. The large files are not immediately pruned.

sasham · 2024-01-22T12:25:19 1705926319

True! They didn't actually have a root pwd to anything, the repo was hosted on BitBucket which didn't have a branch protection feature back then.

Non-coder users are actually an important use case for Diversion (like in game development where many/most users are artists).

jsnell · 2024-01-22T12:51:36 1705927896

And nobody on the team had a reasonably fresh checkout of the repo on their local machine?

To put it bluntly, this story does not sound credible. It's also one of the first things you say in the pitch, which taints everything you write later. I would suggest focusing on what you do well, not in making up stories about data loss with what you perceive as the competition.

(It's especially odd when Git isn't the obvious competition; Perforce is.)