Hacker News new | past | comments | ask | show | jobs | submit login
Pijul: A intuitive VCS unlike Git that's based on sound theory of patches (pijul.org)
269 points by goranmoomin on Sept 25, 2020 | hide | past | favorite | 165 comments



They're in the middle of a big rewrite, which will be out Real Soon Now.

To understand the basic idea, this series of blog posts is excellent:

https://jneem.github.io/merging/

To me, it seems clear that a patch-based VCS is the future. Whether or not it will be pijul that brings us this future, I don't know...


What they seem to be describing is basically a CRDT known as WOOT[0] from what I can tell. The category theory paper they refer goes a bit over my head, so maybe there’s more to it than rediscovering WOOT.

[0] https://hal.inria.fr/file/index/docid/71240/filename/RR-5580...


Pijul is indeed vastly different and significantly more complex than WOOT, in that it handles conflicts (whereas WOOT doesn't). Conflicts are fundamental for asynchronous systems, but don't matter that much for synchronous (or almost synchronous, like WOOT) ones, since users usually notice them immediately as they type.


All CRDTs are fundamentally asychronous, theres no requirement for them to be used in a Google Docs live editing type of setup, so as far as conflict resolution goes I still don’t see any major difference between this conflict resolution strategy and WOOT (aside from character vs line-level units).

It also suffers from the same problem as WOOT that while it’s mathematically (merge) conflict free, you still need to resolve semantic (merge) conflicts before the result is sensible, like in the final example.


> All CRDTs are fundamentally asychronous

I didn't say otherwise. However, Google Docs and WOOT are meant to be used "almost synchronously" (which is why I wrote "almost synchronous" in my answer), in the sense that conflicts are presented to the user almost instantly.

By the way, in the paper you mentioned, the words "synchronous" and "real-time" are used to describe WOOT.


The idea of composing patches really reminds me of of a series of blogposts I read about how Google Wave worked, I think it was called Operational Transformations it has seemingly disappeared from the internet but I remember it being a mind-blowing read at the time.



> patch-based VCS is the future

Sorry for being ignorant, but isn't git patch-based? I always thought of git commits as patches. You can directly convert one into a patch and vice-versa.


Not exactly, git creates a snapshot of the working tree for every commit. The clever bit is that the object store stores files by their hashes, so if two snapshots have identical files they won't be stored twice.


I don't know who flagged this comment and why, because this comment is correct.


As others mentioned, while git shows you diffs, internally it's actually snapshots of the entire repo, with each snapshot listing 0 (initial commit) 1 (normal commit) or more dependencies (merges).

While it works decently, the big problem with this model is its notion of dependency is too course. For example, let's say I'm working on two completely separate things, A and B, and I happen to work on A one day, and on B another, for three days, so I end up doing work A1, B1, A2, B2, A3, B3. In git, I'd end up with the following dependency graph:

A3 -> B3 -> A2 -> B2 -> A1 -> B1

The problem is these are not the real dependencies! If I decide, "oh, you know what, let's drop B and try C," there's no easy way to do that.

In a patch based system, instead, the actual dependencies of every commit are calculated. So in that system, I end up with the following graph:

B3 -> B2 -> B1

A3 -> A2 -> A1

And then it's very easy to say things like "drop B".

(A special case of this that's a big pain-point in git is cherry-picking bug fixes between branches. When you do that in git, git has no actual notion of what happened; it's not obvious looking at any given commit if it's been cherry-picked somewhere else, and you frequently get merge errors later on. If the thing you want to cherry-pick is actually multiple commits, you have to carefully go through and try to cherry-pick them in the right order. It's a huge pain point that a patch-based VCS promises to solve.)



I happened to be looking around to see if there was a rust rewrite of rsync last night and stumbled across this post by the author of pijul that helps explain the benefits of pijul over git:

> Git is great in many ways, but doesn't really know how to merge changes correctly. Even worse, it doesn't know how to "unmerge" changes.

> It does try to merge branches, though, but doesn't give you any reliable guarantee on the result. For the unmerging part, it doesn't even try and you have to rebase your changes yourself, still without strong guarantees.

> Pijul uses a patch algebra to do these things, where patches have several nice properties:

> • Any two patches A and B either commute, or one of them explicitly references the other. This means that repositories behave as _sets of patches_ rather than as a linear history.

> • Patches are associative: applying C on top of a repository that has A and B will yield the same result as applying B and then C on top of a repository that has A. Git could have this, but unfortunately, it doesn't.

> • Patches have inverse patches.

> We also have a sane internal representation of conflicts, so that all conflicts don't have to be solved immediately after merging.

> I should add that Pijul is still quite unstable, and at this particular moment is in the middle of a giant protocol change.

https://www.reddit.com/r/rust/comments/821sgo/announcing_rus...


Blog hasn't been updated since April 2019 and, I don't know if the repo is supposed to be browser accessible, but https://nest.pijul.com/pijul_org/pijul gives no useful response.

Looks interesting but don't know if I'd want to rely on it.

ETA: looks like their twitter account (https://twitter.com/pijul_org) is still active to that's a good thing


afaik pmeunier is currently completely rewriting pijul (since ~1 year) privately so that he can focus on getting the important things right (core algorithms, repo representation) and not on fixing stuff. It's debatable but has been explained. Apparently we could see that 1.0 release soonish. Relevant thread: https://discourse.pijul.org/t/is-this-project-still-active-y...


Last relevant comment by the Author:

By the way, the main things that remain to be done before release are:

    make Git import incremental, at the moment it’s a one-off thing that doesn’t properly save its intermediate states.

    rebalance a few things between Pijul and Libpijul. Florent is taking care of this one.

    update the Nest for the new formats and new protocol.

    write documentation.
In other words, we’re really close to a release.


Thanks, that thread was a good read-thru.

I can understand why it was done the way it was done but it makes it difficult for outsiders to know what's up. An occasional "State of Pijul" blog post would be helpful. As we all know, the Internet is full of somewhat finished but now stagnant projects.


I was digging into tutorials about Git internals recently and was surprised to learn just how straightforward the data structures are. Given how stupid-simple the data is, I'm all the more confused about how straightforward the CLI isn't.

There are very few if any ways to accumulate and amplify errors in the system. It's no wonder Linus created a working prototype so fast. It's the simplest thing that could possibly work. But this simplicity is also the reason why binaries occupy so much space.

A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique, and then a group of people working together to match the implementation verbatim to the proven technique.

Pijul seems to be stepping into that middle space, but it's not clear to me if we're going to follow, or if something else like it will get us moving. I do like the concept, but as someone else stated, it doesn't seem to be very lively right now.


I absolutely love, and recommend to any new team members, GitX [0] and other similar Git visualizers. It's incredibly valuable to be able to instantly see the Merkle Tree drawn out and say "oh, the reason my current HEAD isn't picking up thing X that I thought I'd merged in, is because thing X isn't visually an ancestor of my HEAD even though temporally it might have happened earlier than my most recent commit."

I see newer engineers struggling to memorize "what Git command corresponds to my current situation" all the time, and they're missing the intuition that it's all a very simple graph under the hood. Github, I think, does a disservice by trying to present commits as a linear list - while certainly it's easier to code a linear visualization, it makes people feel like Git is impenetrable magic, when it's anything but.

(Full disclosure: my love of visualizations of commit graphs may very much be influenced by the game Fringer [1], which was a formative part of my childhood!)

[0] https://rowanj.github.io/gitx/

[1] https://www.youtube.com/watch?v=mAV7IioO_t8


> I see newer engineers struggling to memorize "what Git command corresponds to my current situation" all the time, and they're missing the intuition that it's all a very simple graph under the hood.

The reason I have run into this has almost always been because what I want is “take this commit and move it here” and the command for it is some sort of git frobnicate --cached --preserve-stashed that you look up online. “It’s a graph” is great and takes like thirty seconds to explain but once you get that done with it provides almost no insight in how you’re actually supposed to interact with the porcelain to get that graph in the state you want it to be.


> “take this commit and move it here”

Isn't that just `git cherry-pick $COMMITID`?


In the simple case, yes. But what if it’s a merge commit? What if there is a conflict and I actually want the result to split into two clean diffs that I’ll specify (so that both compile, of course). What if I want it inserted somewhere in the middle of a branch? What if the branch doesn’t exist yet but I want to do this to a remote branch without having to check out something new locally? What if I want to apply the change but lie about the author?

Git is able to do all these things, and I am actually quite pleased that it can support all of these strange workflows. But it still isn’t at all obvious how you’d get these to work if you know the operation to apply a commit was “git cherry-pick”. (I have also noticed that “git rebase” is often a, if not the answer to every “how do I fix my tree” question. But it’s certainly not advertised as such, which is beyond me.)


Right, if you want something complicated then it will be complicated, but saying that “take this commit and move it here” is complex is definitely false.


The point I’m trying to make is that I’m very opposed to people who go “the tree is so simple, if you understand it you’ll know how to use Git”. No, the tree is simple, the tools to work with it are not. None of the things I described have complicated end states, because in general you can’t tell how much work goes into getting to a particular from the final graph you get out of it.


not to be trite but the end result of this intuitive system seems to be to bolt on even more math and made up acronyms - both of which beginners really struggle with and most non college educated journeymen devs misunderstand coz of the esoteric nature of their education.

looking forward to

'merge patch graggle revert --flatten'

posts littering stack overflow in the future


Strong agree for git rebase. When I write docs/teach folks about git, I almost never even mention merging. It's just generally very rarely useful, and I find it kind of makes git as a whole more of a black box. "git rebase -i" gives a much clearer picture of what's happening, and will let you solve significantly more complicated problems with the same tools and mental model.


Have you tried the gitless porcelain?


I have not, although I took a look at some point. I really like staging areas :(


Another quick visualization method built into Git:

  git log —-graph —-oneline —-decorate

  # —-graph for the visualization
  # —-oneline for compact commits
  # —-decorate for tag and branch refs
You can also add:

  # —-all to include all branches
I typically add aliases to ~/.gitconfig to use these options by default.


TRIED: git log —-graph —-oneline —-decorate GOT: fatal: ambiguous argument '—-graph': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]'


Retype the dashes, you've posted something different to GP, em dash en dash, or something, presumably your browser/clipboard/term did something funky when you copied.


Should be:

    git log --graph --oneline --decorate


Thanks for correcting my error. Stupid “smart” dashes on my phone.


Yes, the visualizers are key. My experience is that every team needs approximately one Git guru who can run basic training sessions, dictate the workflow (branching/tagging/etc), and fix things when they go wrong. Otherwise you get stick with a bunch of people memorizing Git commands and creating some unusable history.

And while I say “Git”, it’s really the same situation for any VCS, in my experience. I think the underlying problem that a VCS solves is the truly complicated part here.


> And while I say “Git”, it’s really the same situation for any VCS, in my experience.

Sorry, but no. I have taught CVS, Subversion, and Mercurial to executives, artists and students. They have no problem with the mental model.

With git, people with a Master's in CS get screwed up.

Having "working", "staging" and "repository" concepts is the problem. Maybe "staging" makes Linus' life easier, but unfortunately git escaped to the common people and "staging" makes life miserable for the 99% of normal use cases.


Given the number of times I've had to go in and rescue someone using e.g. SVN or Hg, I can't say I've had the same experience.

The major problem is that as soon as you have a team of people with the same repo checked out, you have as many branches as you have people. These branches may not have explicit representations in the underlying VCS, but they exist just the same.

And so then you're dealing with scary "merge conflicts" for work that people have, from their perspective, already done but can't commit and push out.


Subversion is simple to understand because it is simple, and relatively incapable. If you only use git like you would use Subversion, it's simple too. Subversion is much less easy when you have to do something like merging a long-lived branch.


You conveniently omit the inclusion of mercurial in the list. Mercurial is as powerful as git is--in a few cases, arguably more so (phases make history rewriting safe!)--and yet there is pretty objectively far less confusion for newbies than git has.

There's ample evidence that git is unnecessarily complicated for the DVCS model it uses.


I found that the visualizers made it much more difficult to learn git for me. I couldn't make sense of what they were showing. It was just a bunch of lines of wildly different colors that made no sense. (I think part of it was that in most applications and when drawing things on paper, time goes left to right, but the visualizers always draw them top to bottom.)


> Github, I think, does a disservice by trying to present commits as a linear list - while certainly it's easier to code a linear visualization, it makes people feel like Git is impenetrable magic, when it's anything but.

I feel so alone every time I say this, so thanks.

If you give me half an hour (well, I suck at time estimates... a few hours, maybe) with someone, I can fix their thinking about this tool. But I don't have however many hours with the world.


On Ubuntu the older version of gitg is vety easy to read as well.

I like Sublime Merge but can’t for the life of me visually 7nderstand anything from the way it displays branches. gitg is wayy easier, particularlytbat it doesn’t mix a bunch of unrelated branches chronologically.

Like, i don’t care that there is a stack that branched from here or there, let me just see quickly what THIS branch i’m working on "grows" from.


> I absolutely love, and recommend to any new team members, GitX and other similar Git visualizers.

Is there a GUI for git blameall or similar functionality with clickable commits? http://1dan.org/git-blameall/


You have this in vscode I think...


I'd rather recommend Guitar.

https://github.com/soramimi/Guitar

Nice, simple and multi-platform.


Does anyone know a good visualizer for Linux? I find that half the time something goes wrong it's because I don't know the state of the graph.


gitk (comes with Git, part of the Git distribution, your OS packager may have split it out into some package like git-x11)


magit in emacs does a fairly good job,

    git --log --decorate --graph --all
I use all the time (aliased everything except --all as gl)

Also, tig is a nice standalone implementation of a magit like tool.


https://github.com/soramimi/Guitar

Nice, simple and multi-platform.


Gitg. Similar to gitk but with a nicer GTK GUI.


GitAhead works for me


Just an FYI that GitX won't work on an up to date Mac


This is not true. Which version of GitX are you using? There are a number of different forks, and the one I get from “brew cask install rowanj-gitx” works fine on Catalina. It is from 2014, but it’s code-signed and it’s 64-bit.

If you are using the older fork from http://gitx.frim.nl/ then it’s a 32-bit binary from 2009. That version won’t work on current macOS versions, but it will run on my Power Mac.


Yep, rowanj is the one that is up to date, but to be fair it is not the first result on Google.


Here's the fork which I use:

https://rowanj.github.io/gitx/ (`brew cask install rowanj-gitx`)

The upstream also has some signs of life but I don't believe it has a stable release yet:

https://github.com/gitx/gitx


If you've ever used Darcs, that might help motivate why Pijul is interesting. Darcs was the first system I used with an interactive mode, sort of like `git add -i`. Obviously that's a UX-side change that can be (and has been) replicated in Git. But at the time it was fairly mind-blowing to work that way.

The other part is the theory of patches. Darcs took the lead here too, but the algorithms and implementation left something to be desired. So Pijul could be really cool if it finally gets this to really work.

On the other hand, if I'm being honest, all of my really hairy merge conflicts are not things that I think could be resolved by this---not without changing the way we think about code from being about text to being about ASTs. So I'm not sure if Pijul would have any practical day to day consequences for me. Certainly, when I moved from Darcs to Git, aside from UX issues, I don't think I noticed any major practical headaches due to the loss of the theory of patches.


You left out the thing that was awesome about darcs, that pijul could do too: real cherry pick,where picking a change also tracked down the other patches needed for that, you could literally pull whole features from one tree to another, not just a commit.

It was marvelous with darcs.


Isn't that describing `git merge <commit>`? (The whole point of cherry-pick being that you specifically DON'T want anything but the one commit)

Edit: Never mind, I see that darcs patches are not equivalent to git commits (and maybe not to anything in git).


> On the other hand, if I'm being honest, all of my really hairy merge conflicts are not things that I think could be resolved by this---not without changing the way we think about code from being about text to being about ASTs.

With the trend towards automatic code formatting, I don’t think that would be difficult to do.


I think this is where things could go—VCS aware of semantics—which could happen if syntax==semantics.


> The other part is the theory of patches. Darcs took the lead here too, but the algorithms and implementation left something to be desired.

For what it's worth, there's a Darcs 3 in development, with a new patch format/theory, thanks to the two keeping it alive. Find darcs 2 generally pleasant enough with a fairly large code base. I didn't understand the reason for not keeping the darcs interface with new guts for pjiul.


Yep. I’ve used darcs for at least a decade. The mental model is just so straightforward, I simply never wrestling with it. It does exactly what I want with very little thought. I’ve transitioned to git this last year and my head hurts constantly.

Things that used to be trivial are now unsolvable (by me at least).

The darcs ui was a complete joy. Interactive but super fast. Incredibly easy for new uses to learn.


Conflicts can't be avoided (certainly not by tree-diffing), and aren't an error state ("conflict" is a bad name because it sounds like it is). The useful innovation of pijul is that conflicted states are not an exceptional state - you can continue to apply patches.


I understand that conflicts are inevitable, but wouldn't tree-diffing at least be an improvement over line-diffing? I recall that Pijul's theory of generalized files (arbitrary digraphs of lines IIRC) is already fairly complicated though.


It wouldn't really make much of a difference, if any. For source files, anyway - more complicated diffing helps for files we consider just "binary" now but that are actually structured.

Pijul's pushouts are unrelated - that just allows a line in a file to be ambiguous, rather than definitely being one line.


You say Pijul's pushouts are unrelated, but their construction depends on a very line-centric definition of patches. Wouldn't it need to be made more complex to accomodate tree patches?


No? The concept is that a unit of diffing (a line, or a tree node in your hypothetical tree-diffing approach) can be ambiguous until another patch resolves the ambiguity.


In the vaguest sense, sure. But if a file is a list of lines, this "ambiguous file" is a digraph of lines. If a file is instead a tree of strings, what does an "ambiguous file" look like? See this paper which was the source of some of the main ideas of Pijul, and in particular, note that its extension to structured data is listed as "future work", which means it probably hasn't been done yet.

https://arxiv.org/pdf/1311.3903.pdf


I don't think you have understood the pushouts paper. Look at figure 4 again.


Which one's figure 4? They're not numbered.

Also, can you point out what my misunderstanding is? That'd help a lot.


A tool that handles a frequent but not particularly challenging problem is still a net win. Humans make errors. The more times I have to do something manually, the higher the likelihood I have screwed one of them up. I don't expect to get better at doing a task the 101st time. But I do expect the odds that one of them gets cocked up to climb ever so slightly. Better if the machine can just do it.

If the majority of the code is written by middle-of-the-road team members, then most of the merges will be done by those same people. Something that never helps me with my changes still helps me, due to my shared responsibility for the project. This is an often overlooked aspect of the tool selection process.


The CLI is pretty straightforward. First, think of what you want to do, then do it. Consult only the man pages when performing the latter, never some random guy's blog or a confused Stack Overflow post.

Clearing the confused Stack Overflow posts &c. from your mental cache will make it all make sense.


I'm sorry, that is completely useless advice. The manpages randomly bumble around in the abstraction level of their "explanations" and rarely use words that the average person would think of first to describe something.


I can't really reply to this; it just isn't true, and the text is right there for anyone to read if they want to see for themselves.


This is so "true" that there's even a realistic generator:

https://git-man-page-generator.lokaltog.net/


Ha! Best laugh I've had in a month!


sorry i'm gonna sound like i'm making an argument for authority but why do man pages have to appeal to the "average" person anyways? complex tasks require complex explanations to avoid weird hangups/errors.

every time i see people complain about the complexity of manpages i always wonder what their work looks like if manpages is the blocking issue to their understanding of a tool


I would agree that most manpages are pretty useful and that they don't necessarily need to be tutorials (it would be against Unix traditions, haha). Git's manpages and CLI are just a particularly bad word soup.


TODO: man page search engine with semantic term substitution (e.g. Bert). Possibly trained on stackoverfow.


> A system can be so simple that there are obviously no errors, or so complex that no errors are obvious. In the middle ground we get progress by someone mathematically proving the soundness of a technique

As no-one else has commented: One might take the full Hoare quotation a different way, not referring to the simplest thing that could possibly work (and possibly not work well). "[T]here are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. [...]".

He also wrote somewhere -- which I can't now find -- about engineering in terms of producing an implementation that satisfies an initial predicate. In this context, perhaps he'd consider the difficult part to be a theory of the simple model of a set of patches as a design, with obviously (provably) no deficiencies in the required merge behaviour and simplicity in its use (c.f. git). Or perhaps he wouldn't, but a formal methods pioneer would presumably approve of a sound theory behind the implementation.


"A system can be so simple that there are obviously no errors, or so complex that no errors are obvious."

I was gonna compliment you on this gem, but it sounded familiar, and, sure enough -- paraphrased Tony Hoare. Great comment, regardless.


> But this simplicity is also the reason why binaries occupy so much space

Don't most repos now use Git Large File Storage (LFS) to prevent binaries taking so much space?


In the sense that git was noticeably "better" than the centralized systems that came before it, is enough wrong with git to not use it these days? Although there are pain points with git, a large majority of git activities are fairly cookie-cutter


It probably depends on your domain. AFAIK AAA games all use Perforce. Git is just not good managing terabytes of source game assets nor at managing access to those assets so two people don't edit them at the same time since they are usually not mergable.

git lfs was supposed to help but it only solves the storage side, not the permission side, and it's pretty slow compared to p4.

I doubt pijul will make a dent in solving AAA game devs vcs issues though


Lfs is frustrating. The clean/smudge model means files could be in the state of pointer or actual data. Lost count of how many times I didn't have everything smudged, or accidentally commited a 500mb model, or gotten into some wacked-out state where I couldn't clean, smudge, push, or pull without errors and had to nuke and start fresh from remote.

Data Version Control (dvc.org) is a far better model.


> I doubt pijul will make a dent in solving AAA game devs vcs issues though

Pijul has the potential to do that, since patches provide a very easy interface for doing partial checkouts: - just pull patches related to a particular part of the repository. - don't pull patches related to artwork frequently.

Since Pijul patches always commute, this could work.

Additionally, I hear the new (unreleased) Pijul can split patches between a "transform part" (e.g. "I inserted 2Tb in file f, and deleted 1") and a "contents part" (e.g., "here, have your 2+1Tb: …").


That seems to suggest there weren't distributed systems before git (like the proprietary one git was written to replace). I think there's enough wrong with git after long experience with revision control in collaborations; I've been quote happy with Darcs.


Darcs was technically interesting, and also a joy to use. However it suffered from the exponential merge problem, which I think is still not fully resolved.

Once you hit that all the theory and nicities went out of the window.


The merge complexity in darcs 2 really hasn't been an issue in my usage. As I understand it, it is resolved for darcs 3, but maybe I misunderstood that from skimming what was going on. (The new theory seems to be different to Pijul's, and perhaps inferior.) I'd be happy with either as a joy to use.


I/we definitely hit it in real life, sufficiently often that it became obvious we needed to move to another solution.

At the time it was probably CVS to darcs, then it became darcs -> hg, and giving in to the inevitable it became git.


> Darcs was technically interesting, and also a joy to use. However it suffered from the exponential merge problem, which I think is still not fully resolved.

That's kind of the point of Pijul


Yeah if it can be solved then that's great news. But having tried "everything" I think it would be a tough sell to switch at this point.

The momentum is clearly behind git, and has been for quite some time. None of the more recent contenders have ever become popular (bzr, fossil), though of course they have their advocates.

Having the code be reworked in-private, for a long time, too doesn't make it seem like it is ready to become usable in the short-term.


Irritating that this is downvoted. I agree with you though — I simply don’t run into enough (any?) git related pain that would motivate a switch. git is really really simple and can be made to do whatever I want


Many Git users act as if they have Stockholm Syndrome[1]; because of their dependence on it, they don't see all of the issues with it and make excuses for its shortcomings.

I’m old enough to remember the same thing with MS-DOS when GUIs first started to become mainstream.

Git is extremely flexible and allows you to implement almost any workflow you want. But its command line interface isn't intuitive and there are still enough opportunities to shoot yourself in the foot. It's still too easy to get yourself in a situation where you need the Git expert on your team to unwedge the repo.

Sure, I'm a Git user, but not because I wanted to be one; it's the de facto standard and there's a large ecosystem of support for it, starting with GitHub and GitLab.

I use Mercurial for personal projects because I don't want or need the cognitive overhead of Git when I'm doing something for myself.

Out of the box, Mercurial doesn't give you access to a foot-gun unless you really want it. But when you're starting out, you don't want to shoot your foot off before you know what you're doing.

I enjoy Mercurial’s concept of changeset evolution[2] and Phases[3] which helps avoid all of the coordination and potential drama of rebasing on top of changes that shouldn't have been changed, etc.

The Evolve plugin[4] allows for intelligent and sane history rewriting, using the underlying support for changeset evolution.

The Evolve plugin also provides support topic branches[5], reducing the overhead of managing them.

There's a great description of how this works in Mercurial on the Git Minutes podcast[6].

[1]: Stockholm syndrome is a condition in which hostages develop a psychological alliance with their captors during captivity. Emotional bonds may be formed between captors and captives, during intimate time together, but these are generally considered irrational in light of the danger or risk endured by the victims.

[2]: https://www.mercurial-scm.org/wiki/ChangesetEvolution

[3] https://www.mercurial-scm.org/wiki/Phases

[4]: https://www.mercurial-scm.org/doc/evolution/user-guide.html

[5]: https://www.mercurial-scm.org/doc/evolution/tutorials/topic-...

[6]: https://episodes.gitminutes.com/2013/05/gitminutes-07-martin...


My personal theory is that things like version control to use are very rarely chosen by beginner or even intermediate programmers. Those decisions are likely made by most experienced people on the team.

And those experienced people likely want a foot-gun and probably use it every week. And their workflow has been evolving since VAX era, so by now is so weird, only something like git can handle them.

Imagine a conversation: "git is great! Remember how back in CVS days you had to generate patch(1) files if you wanted to save you work? well, in git it is much easier, you just need to ... Wait, what do you mean you don't know patch(1) is?" :)


I think there’s some truth to what you’re saying.

What’s ironic is that many of Mercurial’s commands are the same as CVS and Subversion, which makes switching easier.

Git’s commands are often quite different.


But a unintuitive command-line interface and footguns don't seem like big enough issues to actually lead to Git being "overthrown" in a similar way how non-distributed VCS's reliance on a central server that you're always connected to was. I mean, people work their way through the confusing CLI until it's no longer a major blocker because it's so dominant, and you can usually recover from shooting yourself in the foot without data loss - making either point an annoyance, but not to the point of actively looking for and learning an alternative.


But a unintuitive command-line interface and footguns don't seem like big enough issues to actually lead to Git being "overthrown" in a similar way how non-distributed VCS's reliance on a central server that you're always connected to was

I would argue that GitHub was the killer app for Git: it was so compelling that people were willing to put up with Git’s issues so they could use it.

People forget that Mercurial had the early lead in terms of mindshare; many companies adopted it because it was easier for their developers to learn and its syntax was familiar to anyone who used CVS or Subversion. (Linus’ disdain for Subversion is no secret; perhaps that was part of the reason he was okay with Git's syntax being somewhat backwards compatible).

At a certain scale, Git's issues can be too much of a liability, as described by Facebook: https://engineering.fb.com/core-data/scaling-mercurial-at-fa...

For example, look at Google's assessment of Git vs. Mercurial: https://web.archive.org/web/20130116105028/http://code.googl...

Here's the first advantage of Mercurial cited by Google:

Learning Curve. Git has a steeper learning curve than Mercurial due to a number of factors. Git has more commands and options, the volume of which can be intimidating to new users. Mercurial's documentation tends to be more complete and easier for novices to read. Mercurial's terminology and commands are also a closer to Subversion and CVS, making it familiar to people migrating from those systems.


Git works well for large distributed teams. Very little software is developed like that. Most software is made by small teams. For small teams, git's model adds extra overhead. Small teams can be more productive with a simpler source repository.

Release processes benefit from incremental commit numbers. They are easy to reason about in release planning. They are useful for gating features. Example: Wait until api server deploys commit number N or higher before enabling app feature A. They provide an easy way to track the age of binaries running in production. Example: We can turn down API B once every deployed production binary is built from commit number N or newer, since that is when we deleted the API B client library.

Merging and rebasing is easier to understand with incremental commit numbers.

Git has no support for code reviews. For example, it provides no functions for distributing versions of uncommitted changesets, distributing comments on uncommitted changesets, or linking comments to particular versions of changesets. These are core parts of the software development process. Code review comments are crucial context for future readers to understand the code. They belong in the repository with the code.

Git's "stash" function is full of footguns and is therefore mostly unusable.

Git's nomenclature is inexact. A git "commit" is actually a set of changes to files. It would be better called a "change set", "change list", or even "diff".

Git's documentation needs work. It needs to explain common use cases. It needs to warn of more pitfalls. It needs more common use case examples. For example, https://git-scm.com/docs/git-revert has no example of specifying a particular commit or a set of commits. Another example, https://git-scm.com/docs/git-stash is missing a link to the article https://git-scm.com/book/en/v2/Git-Tools-Stashing-and-Cleani... and that article needs a big fat warning about the biggest footgun: using 'stash' and then 'reset' and accidentally erasing untracked files that you thought were stashed.

Git sees every source file as plain text, ignoring structure. For example, reformatting destroys blame info. Changeset metadata includes lines added/deleted/modified but treats whitespace changes the same as changes to comments, data, and logic.

Git provides no access control functions. Users must rely on third-party tools. Users need to apply ACLs to source, code reviews, and test output. This is complicated to set up with third-party tools.

I use git and I look forward to switching to a better tool in the future. Maybe Pijul will be it?


The biggest turn off for pijul right now is I can’t install it. I try cloning the latest source from the Nest, and it crashes partway through. The crashing is probabilistic, but the repo is large enough that it is guaranteed to happen. This problem has existed unfixed for a year and a half.

This does not instill confidence.


It's gone through a complete rewrite. The developer has stated recently that the rewrite will be released soon.


Yes but if the developer experience is that for a year and a half they can’t fix a bug that is preventing people from doing the first step of engaging with the source code, that doesn’t bode well.


Can someone with more domain knowledge than I comment on whether or not this project is mathematically possible?

I remember the reason given for not making git patching sound was because all the sound algorithms known to the git author were slower in the common case, and only better in less common cases.

Darcs is really fast for what it does, but git is way faster by doing less.


Excellent question! This project is indeed mathematically possible, but did require two separate innovations:

- an "engineering" one, called Sanakirja, which is a fast, forkable, transactionnal, on-disk key-value store. Although it might not be as super-fast as other database engines, it's forkable (allowing to make fast branches in Pijul).

- a few "mathematical" innovations:

  - a good understanding of why Darcs is linearly slow in the normal case, and exponentially slow in the worst case.
  - some category theory to give a hint of relevant datastructures.
  - a few graph algorithms with just the right algorithmic complexity.
  - a good enough data representation for patches and repositories.
(disclaimer: I'm the lead author on most of this).


It is mathematicaly possible but has not been for a very long time. Darcs algorithms suffer from complexity problems that have now been theorically solved, Pijul implements those new algorithms in the hope of getting both a good runtime and better properties than git.

For some resources on the underlying algorithms, see: https://pijul.org/manual/theory.html


The actual execution time of local git commands is never noticeable in my work. The time spent cleaning up merge conflicts is very noticeable though. I could very well imagine being faster overall with a tool that runs half a second longer every time I commit, but that saves me from cleaning up cases where it couldn't figure out a merge.


I find that if you spend a lot of time on conflict resolution, it may be an indication your workflow is fighting git.

There is a lot of really bad advice on git workflows out there and even worse workflows in the real world.

I think gitworkflow(7) should be required reading, as it explains not just how, but why the kernel uses git the way it does


Do you mean https://git-scm.com/docs/gitworkflows ? It doesn't seem to say anything about the kernel. Maybe you mean something else.

Our workflow isn't in conflict with any of this, but some parts of our codebase evolve quickly, so topic branches regularly need to merge or rebase conflicting stuff.


> you can "unapply" (or "unrecord") changes made in the past, without having to change the identity of new patches (without "rebasing", in git terms).

So I the identity of a patch doesn't depend on history? Nice! I wish git was like that. In fact, every time I rebase on master, I wish so.


That's by design: if you know a Git commit ID, you automatically also know the history hasn't been tampered with (modulo hash collisions). Linus specifically wanted a system where development history cannot change behind developer's backs. Given that the other big design constraint of Git was distributed version control, I'd argue that this is a very important property to have. You don't want to pull in someone else's branch and suddenly have all your own developer history change.


i'm not sure if this project is different, but patches can suck really bad. an example of this are mercurial queues (mq). with no history/ancestry, the merge behaviour can be horrible. for even medium-size teams, it goes downhill quickly. IMO, it's fucking broken and a huge waste of time. of course, it's possible i don't get mq. but i'll take git any day of the week over mercurial + mq, at least git works and doesn't loose my work as often. (believe it or not, this is easier to do with mq than git (!).)

maybe pijul is different, but their "conflict" doco doesn't fill me with confidence: https://pijul.org/manual/conflicts.html


Patch theory VCSes are, IMO, very much not the future of VCS, but they're an important research project, and the future will have learned from them. Git is definitely the present of VCS, even if it was adopted for all the wrong reasons (not Git's fault).


Mercurial is not pijul/darcs. They’re fundamentally different at the core.


> > So I the identity of a patch doesn't depend on history?

> with no history/ancestry, the merge behaviour can be horrible.

It's possible to have history associated with a commit without making that history a immutable part of the commit's cryptographically-enforced identity.


The cryptographic guarantee that a commit with the same identity represents the same codebase state is the killer feature of git: it’s great to be able to trust the history of a repository that way.


> The cryptographic guarantee that a commit with the same identity represents the same codebase state is the killer feature of git

Um, no? I don't think anyone in the history of git has ever reasoned "Well, I'd like to use this other VCS, but git is really useful for my use case of working with co-developers who are so untrustworthy that I need cryptographic assurance they haven't tampered with the history of the repository, but who I nonetheless trust not to inject Underhander C Contest[0]-style bugs in the code or otherwise deliberately sabotage things that aren't the history.".

The killer feature of git is operating on the repository and its history as a graph structure, which works even if commit ids are completely non-cryptographic GUIDs. (In the single-user case you could even use sequential integers, but that doesn't scale.) Cryptographic assurance is nice to have (dumb mistakes and unexpected malice happen to everyone), but it's worth trading away if the resulting features are good enough to justify the trade-off.

0: http://underhanded-c.org/


As a single developer or a team working in the same organization you can ignore this property, however in a distributed VCS it is pretty much a defining and essential feature.

Guaranteeing that the history is the same is paramount to ensure that the same operation on two repositories which are _supposed_ to be identical will yield the same results.

This is what allows you to merge back history from a forked repository over which you have no control with confidence.

You don't necessarily need that property, true. You also don't necessarily need to make it mandatory. However if you want to play in the distributed VCS game, you'd better have something equivalent that can give you the same guarantee.


> As a single developer or a team working in the same organization you can ignore this property,

That was my point; by fiddlerwoaroof's logic there would be no compelling reason for single developers or cooperative teams to use git (besides cargo-culting the linux kernel devs). But in fact there is such a reason - the graph structure I mentioned, or rather the sophisticated operations based on that structure.

> This is what allows you to merge back history from a forked repository over which you have no control with confidence.

When merging commits from a source repo into a destination, for each commit, either:

a, it doesn't already exist in the destination, in which case you have no way of knowing that the previous-commit data is correct, because while tampering would change the commit hash, you don't know what the old hash was because you've never seen the commit before.

b, it does already exist in the destination, in which case you have a perfectly good history for it already and (assuming the source disagrees, otherwise you'd just always ignore it) can either simply ignore the source's idea of where it came from, or (probably more usefully, but it depends on how you're organizing things) alert the user that they have two conflicting claims about the history of the commit, and ask for help the same as any other not-auto-resolvable merge conflict.

Git effectively treats b(≠) as "tell the user the source repository is horribly broken because its commit hashes don't match their content", but it fundamentally can't give you confidence about the parts of "history from a forked repository over which you have no control" that it hasn't seen, and including history in commit hashes isn't necessary for noticing that the parts it has already seen before don't match up.


I never said it’s the only reason to use git. However, it makes things like SOX change control easier because you can pass the commit ID around as a shorthand for a known state of a repository that is, practically, a UUID.

Similarly, as a single developer, it means I can verify repository backups by checking that all the repositories have the same branches and that the head of each branch has the same commit ID.


> a, it doesn't already exist in the destination, in which case you have no way of knowing that the previous-commit data is correct, because while tampering would change the commit hash, you don't know what the old hash was because you've never seen the commit before.

In a fork we validate the common ancestry. This is used so that you can diff from the last known commit to the current point in absolute terms instead of relative.

> b, it does already exist in the destination, in which case you have a perfectly good history for it already

Unless it has been rewritten. Which, in git, we know can also happen by mistake. In such cases we can argue that you can diff the entire source tree and narrow it down the point where you believe the history path diverges and spot the changes youself, but git makes it a little bit more convenient.


so in your theoretical example you'll never work with junior devs who make mistakes and screw up the history/trustworthiness of the repo? how do you recover from that?


> maybe pijul is different

No "maybe"; that's the point. Why does modelling something fundamental, like conflicts, decrease confidence in the approach?


> Patches can be applied to a conflicting repository, leaving the conflict resolution for later. (from the docs)

but you still have to resolve the conflicts at some point, right? and it doesn't say how this is done. so it isn't clear if this approach is less painful in practice when you apply a bunch of patches.


I haven't used pijul, but I'm familiar with darcs and manually resolving the conflict with a new patch. You still have the conflicting patches, which you can manipulate. There's no claim of magic resolution, and I don't see why having it in the model is a negative.


What's the relationship between its data structures and CRDTs? The emphasis on patches makes me think this might be an operation-based CRDT.


Pijul is a CRDT. The main new idea is that it the smallest generalisation of text files that is a CRDT while keeping important properties such as the relative order of parts of the file.


You can find an explanation of how it works at [0]. The vague notion of "patch commutation" is central to both operational CRDTs and Pijul's theory of patches, but other than that, I don't really see any similarities.

[0]: https://jneem.github.io/merging/


Ah I was hoping this meant the rewrite had landed.

Should be soon, judging by the discourse. Looking forward to it.


Pijul has come up a few times. Can someone help me answer the 'how would it help me' question?

Under what real development scenario(s) would Pijul's approach help me?


It's a distributed version control system, implementing a sound mathematical model of software development (patch theory). In this sense pijul is like darcs, but it doesn't suffer from exponential-time edge cases when merging; hence it should be a pareto improvement.

Of course, the elephant in the room is git: when git came out, most darcs users (eventually) switched over to it. Hence there's not many darcs users left for which pijul would be a no-brainer. There are many git users, most of whom have never used a patch-based DVCS, who would need a more compelling reason to switch.

Personally, I'm glad the patch-based approach is still going, as it's always good to have competition. Plus, having a sound underlying model usually brings insights to a problem which aren't at all clear in more 'cobbled together' solutions like git (e.g. commutativity of patches is quite a natural property in darcs/pijul, which doesn't directly make sense in git due to its "commits" specifying their parents).


I haven't made up my own mind, and certainly intend to try this out.

But I want to point out that your first paragraph offers no answer to the question, same as the title here doesn't. And I don't say that to be snarky -- what you list are implementation details, not an answer to the question. As a developer I don't care what kind of theory my program is built on I just want to know how it will save me time and give me new functionality. (In my spare time I can think the math is cool and be content with that).

(Even more specifically -- git is a DVCS already, it's just most people seem to use it with a central server like GitHub, so personally I also didn't understand what the page meant when calling that one of pijul's advantages).


In principle I do care about the fundamental model being correct - that can often bubble up to fewer (or cleaner) edge cases. Without help though, or many hours of usage I can't mentally map that model to what it is actually going to be like to use.


Everything old is new again. This is how Smalltalk managed changes. Simple and straightforward.


The Pijul website says their patch model is "based on a result from category theory" and they link to a paper from 2013. But wasn't Smalltalk invented in the 70s? It seems to me that either the paper wasn't novel or that Smalltalk used a different system. Why do you believe it's the same?


Good question. I'm not equipped to answer that. But let me share what I know about Smalltalk's change-management.

Smalltalk has the concept of "change-list". Because in Smalltalk-the-language everything is a class or method, the change-list is basically a list of method-versions. The "unit of change" is a method.

In GIT the unit of change is a "file" and a file can have arbitrary content. In Smalltalk the change-history is much more semantic because it is based on the semantic concepts of "class" and "method". Smalltalk change-list knows which methods of which classes were changed and how and when (and by whom). The user does not need to read changes within a text-file and then try to understand for themselves what semantic constructs were changed in that file in that commit.

Git is programming-language-neutral vs Smalltalk change-history is specific to the concept of "classes with methods". It could as easily apply to any OOP language as far as I can see.


I did a lot of work in Smalltalk earlier in my career, and it's still the best toolset I've ever used.

On one hand, yes, having the method as the unit of code is great. You just avoid all sorts of editing and versioning of non-semantic textual changes like you have in other languages. And it was really easy to look over your list of changes to see who did what when, or group them together to form larger patches.

On the other hand, collaborating with bare change sets was difficult. With no dependency tracking it was just a manually-maintained bag of class-and-method definitions. You could write those out to a file and import them into another image, but it was hard to make sure you had all the necessary bits.

This led to several Smalltalk-specific VCSs. The early ones were powerful, but basically required you to be connected to the net to do any work, because they were constantly communicating with a central server as you worked. Later ones were more snapshot based. I think Monticello was the first real distributed VCS - it preceded git by a couple of years.

What's interesting about Pijul is it gives you that "maintaining a set of patches" workflow that made early Smalltalk change management so pleasant, but also tracks dependencies between the patches so that you can easily mix and match functionality without anything getting left out.

The ultimate tool would be something that combines Pijul's theory of patches with Smalltalks semantic versioning. That would be amazing.


The interesting thing is that “modern Smalltalk” (Pharo, at least) is going out of its way to integrate git rather than Monticello.


This is more of a cultural issue than a technical one: they want to attract mainstream programmers to their system, and the mainstream pretty much expects git compatibility.

I was the same way when I first picked up Squeak/Pharo. Only when I really went deep into it did I start to appreciate Monticello for its simplicity.

I hope that the Pharo/Squeak people keep both options around for a long time.


Git makes sense for Smalltalk in one sense: If your purpose is to create the perfect Smalltalk "image" or a set of them one after another then it makes sense that we have a single set of source and a single repository for it. This is so if you view the whole image as a single application. Like say you would view Linux as the single application whose development your VCS was created to support.

But if you view the image as a "workspace" where you are developing multiple projects and multiple components which ideally could be shared among multiple developers working on different applications, then the trad. Smalltalk change-management makes more sense. You want to track the versions of individual components/classes/modules, not just the version of the whole image. (Maybe you could call this the Cathedral vs. Bazaar?)

Saving the whole image --which Smalltalk developers do many times a day-- is similar to creating a version of your whole repository.

Individual developers do "branch" by having their own copy of the initial image. At some point they need to merge (or rather "combine" see below) some components from their images with components produced by other developers.

The difference is that in Smalltalk the merge happens by combining specific components/classes/modules, not by trying to automatically merge the whole "image".

Because of the component-based model of Smalltalk it is easy to divide the work so that each developer works just on their own classes or even on just their own methods of a class. Then "merging" is mostly a non-issue, you just combine the latest versions of different classes and methods from different developers.

Why is traditional "merging" needed in the first place? Because to accomplish something you may need to modify some existing source-file in some way. In Smalltalk instead you could subclass an existing class and make your fix there, and use the new subclass instead of modifying existing source. (Ideally).


> With no dependency tracking it was just a manually-maintained bag of class-and-method definitions

True. A simple improvement would be to make the change-list contain also the (single) previous version of every changed methods, as well as the class-definition that was in effect when the method was changed. That would not have been too much added code to carry I think.

Then maybe augment the importer so it warns if the previous versions don't match the current versions if the new versions don't match either.

In the end a tool is just a tool, to make programming easier and to find errors more easily. No tool solves all the problems. That is why I think Smalltalk was so great even if it didn't have compile-time type-checking, it was still easy and fast to spot errors in your code.


Makes sense. Coming from a Smalltalk background I had some difficulty getting used to the idea of git "commits" which are basically snapshots of the whole repository. Commit-ids are like version numbers but they apply to the whole repository, not to individual artifacts like files.

Whereas if we think in terms of patches/change-sets it is easy to think of applying a set of changes to a given artifact, and "patches" being artifacts too. Also we could have branches of patches, not just branches of the whole repository. Does Pijul work something like that, "branching patches"?


The unit of change in Pijul is also a file, which is a finite sequence of lines, generalized to a finite directed graph of lines to deal with conflicts. The idea of a patch-based VCS is as old as the idea of a VCS. Pijul's theory of patches, and the techniques it uses to keep things speedy, are pretty new.


Not sure the need to mention git in the title


git is the de facto VCS, so comparisons to it provide a reasonable baseline for comparison for possible users.


I realize this is an old project and that it's not a big deal and that this may seem childish and many other disclaimers.

But.

Pija is a specific synonym for penis in many Spanish dialects, so this tool's name reads like Dickul in English, which is something I think the owners should be aware of if they're not already.


I think you'll struggle to find a human pronouncable 5 (or fewer) letter word that doesn't mean something nasty in some language.


But Pija is not part of the name; Pij is.

Italians use Fica for the female counterpart, and the national automobile company recently formed FCA. People joked about it and then everyone moved on.


See also Coq, the theorem prover.


Git also isn't a very nice word. https://www.merriam-webster.com/dictionary/git


Git doesn't have sexual connotations. any term with sexual connotations is a hot potato nowadays in software biz/collaboration


Everything is a hot potato when your basic needs are met and you’re bored and more importantly you desperately feel the need be “right” about something, perhaps listened.


It’s not against you, I just think that people are getting more and more offended about a lot of things. They should be aware, that in most cases these names aren’t intentional chosen, and choosing a name that fits anyone might be work that’s not worth the effort, so people should just get over it, have a laugh and go on.


The word sounds odd in many slavic dialects too.


I can't find much talks or examples of it, any good recourses besides the official page?


The authors are obviously very proud of their clever algorithms, but as a user my main concern is not the implementation details. My main concern is what can this tool do for me?

What are the new and amazing features I can use? Or which operations go faster?


Is Pijul something worth to look at if I am also interested in versioning binary file involving workflows? For example if I am an electronics engineer doing PCB design work? Or is it most suited for software projects with text based files?


It depends: the main issue is the diff algorithm. Pijul's diff is not line-based, but rather "byte block"-based (which at the moment happen to be lines).

This means that we could totally imagine more clever diff algorithms for specific use cases. I don't know anything about PCB design files, but an example application of this is to treat indentation and spaces as their own blocks.


Not an expert but I think it is only suitable for text file since most diff algorithms assume the files are made of not-very-long lines.


I'm going to switch to pijul for personal work but would miss magit.


Yeah, for better or worse, Magit, Ediff, and Smerge solved pretty much all of my problems with Git. Now if only I could convince my co-workers to use Git "correctly" with the help of these tools...

I think that's a place where things like Pijul has a chance to shine. Some people (perhaps rightly) are just not interested in learning the tooling and extra work around Git that makes Git even more powerful. They're relatively happy with following out of the box defaults, even if it doesn't lead to a great experience. Something like Pijul might ship with the better default experience.


Magit is great, but it's 'just' an interface. It makes things easier but it doesn't affect your git work flow. How would you benefit from other people using it?


It's a complete and sensible enough interface that whatever i want to do during normal work, Magit can present it in a very intuitive interface and it's easy to do as the default.

With less complete interfaces, other people have to fall back to the git CLI interface, which is unintuitive enough to discourage fairly routine operations of branch manipulation.


There are a lot of good VCS systems, but it seems like nothing can compete with Git in popularity.


Well, it's hard to beat the combination of very simple data model and lots of convenient commands.

The convenient commands means everyone has their own workflow. Someone might want to create "ideal commits" instantly. Someone else might interactive rebase all the time. Yet another person can cherry-pick. Or soft reset and re-commit. Keep few branches or many branches. Or do all the work one one personal branch and pick the relevant commits later.

The simple data model means you don't have to understand what the other person did in order to work with their changes. You can work you way, they can work their way, but the commits are all the same. It is a tree of files with name and parents attached to it.


This looks kinda cool, looks like FreeBSD already supports this directly.


Just in that there's a package already?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: