Git, Graphs and Software Engineering

sametmax · on Sept 30, 2017

Except most hard but interesting operations with Git are not very visible in histories.

They are:

- getting a clear status of you repo so you know where you are and what you do;

- figure out which one of the various method of undo you need for this specific screw up;

- find a practical way to compare this particular version of a snippet with another particular version of a snippet, and work on the result;

- merge the terrible mess you coworker rush-pushed to avoid to be the one to have to merge;

- rediscover the arcanes required to save only what you want. Maybe I should branch and commit. Or stash. Or reset and add. Or do an interactive thingy ?

- survive a rebase with squashing;

- use the history, trying to look for clues of what the heck is going on with the current version of your code.

- setting up those damn editors, viewers, differs, etc.

So I LOVE your idea, because having a set in stone subset of git for most projects that give you one, and only one, obvious way to do X is really needed.

But I don't think stats will help you. You need a LOT more than that. You need several people that have tried it all, can agree on a solid, versatile yet definitive workflow that will work in the pareto case. And then a UX designer that can sculpt that into something.

And right now, I have tried every single git UI under the sun, command line and not, and they all fall short outside of the basic use cases.

Some manage to prevent the user from screwing up too much (github client), some have a nice overview window (kraken), some have defined a clear subset of operations (GUM), some are productive (smart git), some are well integrated to the file explorer (git tortoise) or the IDE (vscode)...

But basically, every time I go back to give a Git training session, I have to start with the classic cmd client. Because that's the only one I can trust to do the job perfectly. But I also have to provide such a long cheat sheet it's not even funny. And it's useless before each student perfectly understand what's going on anyway.

Git is merciless with understanding and doesn't allow high level thinking. We have yet to come by with a decent abstraction for it.

jancsika · on Oct 1, 2017

> - rediscover the arcanes required to save only what you want. Maybe I should branch and commit. Or stash. Or reset and add. Or do an interactive thingy ?

Here's something I'm trying:

1. Always branch and commit.

2. Push branch as a work in progress.

3. When branch seems ready to merge, make a new branch from that branch.

4. In the new branch, go crazy. Squash, reorder, whatever the heck.

5. If you screw something up royally in #4, delete that new branch and go back to step #3.

6. If CI succeeded for new branch, merge new branch.

7. Delete work-in-progress branch.

What kind of situations aren't covered by this approach?

viraptor · on Oct 1, 2017

Once you're comfortable with the destructive commands, you could even drop the temporary branch and rely just on reflog for the undo. But I understand how this could be easier at the beginning.

mcny · on Oct 1, 2017

> Once you're comfortable with the destructive commands, you could even drop the temporary branch and rely just on reflog for the undo. But I understand how this could be easier at the beginning.

Ideally, we'd make each of these "feature" branches as short and straightforward as possible but then we are out of git territory and into some kind of project management territory?

The one thing I don't like about destructive commands is that once you publish to a branch that others are working on, you should rarely* use them.

* I'd say never but I usually don't like talking in absolutes.

jancsika · on Oct 1, 2017

The point is not to get comfortable with the destructive commands. You have one WIP branch that is essentially just cloud backup. Then when you're ready to open your branch up to the world you create a new branch that you've squashed and pruned to your liking-- on Gitlab, this would be the branch for which you submit a merge request.

That new branch is the one that receives comments and revisions from all developers-- ideally none of that gets squashed or rebased.

The danger is that you silently drop data when trying to make the new branch pretty. But even in that case you still have the old WIP branch as backup, at least until you do a release. And hopefully by that time you've tested the code and had other people look at it and use it.

oddlyaromatic · on Oct 1, 2017

I really like this. I get overwhelmed by the options and want to be prudent yet have freedom to screw up.

jancsika · on Oct 1, 2017

The other benefit is that this flow encourages you to push the WIP branch to remote as early as possible. Otherwise I find I'll work too long exclusively on the local repo getting things "just right", and creating an unnecessary risk of data loss.

paulddraper · on Oct 1, 2017

Specifically, for step 5:

    git reset .
    git clean -df
    git checkout -f wip
    git branch -D final

noway421 · on Oct 1, 2017

Work in progress branch can't be named the same as final branch. That's inconvenient.

reseting back based on reflog is more straightforward in terms of preparation.

sametmax · on Oct 1, 2017

Read the comment again, it's not about what can or can't be done.

couchand · on Oct 1, 2017

To answer just one of your questions, many commands have a patch mode as -p. At this point I rarely just git add, it's always git add -p. A side benefit is that you get a chance to review your own code, make sure you cleaned everything up, and can't accidentally commit something.

And you may be interested in the Gitless papers, they are a very interesting approach. I personally think I've got a good enough sense of Git to generally use it pretty effectively (and I'd say with somewhat high-level thinking, though you may disagree), though I'll admit that's after reading the Git Book a few times.

Honestly, my opinion is that most devs just rush their work all the time, and having a different model of version control can't stop them from doing so. Careful, measured practice is the solution, but that's often a hard argument to make in the current industry.

noway421 · on Oct 1, 2017

>-p

thank you!!

nextos · on Sept 30, 2017

> Git is merciless with understanding and doesn't allow high level thinking. We have yet to come by with a decent abstraction for it.

I toyed around with Pijul and found it simpler conceptually. It still has a long way to get ready for production, though.

sheriff · on Sept 30, 2017

I came here to post something similar. I'll add that many of the hard things we do with Git seem to do with re-ordering or re-combining of the underlying changes. If we want to make it easier to reason about changes to a set of changes, then I think we really want those changes to have some properties which they don't currently have.

It's powerful for Git to treat changes as line-by-line text diffs, because it allows us to manage changes to any textual data. But what if, instead, we borrowed an idea from distributed databases, and implemented all changes as commutative operations on a Conflict-free Replicated Data Type (CRDT)?

I think almost every example of difficult rebasing would get significantly easier, but at what cost? We'd have to completely rethink how we write programs, because this would drastically limit the types of changes to a program that were valid. I wouldn't be surprised if this would require us to develop in entirely new languages.

There might be some meat to this idea, but again, I don't think we'd get there by mining existing Git graphs.

steventhedev · on Oct 1, 2017

Git doesn't operate on diffs. It stores full content using delta compression. Subtle difference, but it can create ours reverse merges that don't have a diff, but radically change the content of the repo.

What you're talking about is patch theory, which is used by darcs and pujil. Pujil does a better job of explaining the theory.

At the end of the day, the point of version control is to keep a universally consistent snapshot of a sequence of bytes. Patch theory only tells you how to resolve conflicts. TreeDoc, etc simply resolve the conflicts differently based on consistency of ordering, as patches may be applied out of order for it to be a CRDT.

hyperpape · on Oct 1, 2017

Curious, have you compared this idea to what Darcs does (I don't know Darcs well enough to do justice to it, but it sounds related).

sheriff · on Oct 1, 2017

An example of what I'm thinking about, which I don't think Darcs can do (I'd love to be wrong):

Alice and Bob both branch off of master at the same point. In Alice's branch, she moves function `foo` into a different module/file. In Bob's branch, he changes `foo` to handle a new condition. Both wish to merge into master.

Whoever merges later is going to have a merge conflict, and have to resolve it manually, using their human understanding of the semantics of both changes. It's clear to me how that conflict should likely be resolved, but as long as those changes are presented as text diffs, I don't expect my VCS to be smart enough to figure that out on its own.

It would be interesting to explore other ways of representing changes, such that a computer would understand how to compose them in more situations like this.

You can quickly come up with examples of changes which conflict in a way that should probably always require human intervention: Say Alice and Bob each wish to assign the same constant to different values.

So, I don't expect that you could completely remove the need for developers to manually resolve tricky conflicts. At least, not without completely changing how we express changes to programs, which may well be a non-starter for practical purposes.

lozenge · on Oct 1, 2017

There is a product called semanticmerge that does this.

sheriff · on Oct 1, 2017

neat! thanks

sheriff · on Oct 1, 2017

I'm unfamiliar with Darcs, but thanks for calling it to my attention. Based on a quick look, it appears Darcs uses text diffs, so it's not quite what I'm talking about, but it's definitely interesting.

mappu · on Oct 1, 2017

>I have tried every single git UI under the sun, command line and not, and they all fall short outside of the basic use cases

I haven't found any git UI that matches TortoiseHg, for new and experienced users alike - my best picks were "Git Extensions" (Windows-only) and gitk + git-gui elsewhere.

I had high hopes for QGit but it seems to be missing a lot of features.

dozzie · on Sept 30, 2017

> - getting a clear status of you repo so you know where you are and what you do;

Me repository?

What's missing from `git status' and `git show-branch'?

> - figure out which one of the various method of undo you need for this specific screw up;

It's not hard. The hard part is to acknowledge that you actually need to understand git's internal data structure. All the rest is very easy, up to and including `git reset --hard'.

> - find a practical way to compare this particular version of a snippet with another particular version of a snippet, and work on the result;

I didn't understand what you mean.

> - merge the terrible mess you coworker rush-pushed to avoid to be the one to have to merge;

Me coworker?

And I again didn't quite get what you mean. You'll have to merge anyway. And it's not specific to git, it was the case for any and all version control systems.

> - rediscover the arcanes required to save only what you want. Maybe I should branch and commit. Or stash. Or reset and add. Or do an interactive thingy ?

What arcanes? It's quite simple thing, especially if you learned how git works.

> - survive a rebase with squashing;

`git tag my-old-head' and work until you're satisfied? What's difficult in that? (Apart from the fact that you need to understand how git works)

> - use the history, trying to look for clues of what the heck is going on with the current version of your code.

How is it specific to git? It was the same for any and all version control systems.

> - setting up those damn editors, viewers, differs, etc.

I didn't set anything up, apart from colours used. Or maybe you're talking about $EDITOR/$VISUAL, which you need for many other console tools anyway?

> [...] every time I go back to give a Git training session, [...] I also have to provide such a long cheat sheet it's not even funny. And it's useless before each student perfectly understand what's going on anyway.

Maybe because you're teaching them the wrong thing? Git's data structure is a quite simple thing, it should take half an hour to teach. And with understanding this structure, it's impossible to get yourself into such a mess that you couldn't dig yourself out.

TrispusAttucks · on Oct 1, 2017

I think you're missing OPs point. A git repo's history shows changes to the source code but not necessarily underlying git usage patterns since many git operations don't live in the history.

fovc · on Sept 30, 2017

I think Tarsius already did this and called the result Magit. For bonus points he made it perfectly interoperable with git

Edit: Kickstarter campaign going on with 23h left https://www.kickstarter.com/projects/1681258897/its-magit-th...

Tenobrus · on Oct 1, 2017

For clarity, Tarsius is not the original creator of Magit, that would be Marius Vollmer (https://github.com/mvollmer). Tarsius is "just" the current maintainer and most prolific contributor. I'd strongly recommend contributing to the kickstarter if you use Magit, however. I donated $100 and felt that was not near what it's actually been worth to me. Don't think that just because it's fully funded you shouldn't bother, there's good reason to keep contributing: https://www.reddit.com/r/emacs/comments/71viq3/the_magit_kic...

tarsius · on Oct 1, 2017

Indeed I would not have created Magit from scratch. From the campaign description:

"I would like to thank [Marius] and the maintainers who came later, for I would not even have known that I needed something like Magit had they not laid the groundwork in the early days of Git."

Not trying to undersell my own work - elsewhere I also write:

"Despite all this, it would be wrong to assume that Magit started out with a predefined set of interface concepts and abstractions, and all that was left to be done was incrementally filling in the gaps."

Oh, and thanks for you contribution.

yeukhon · on Oct 1, 2017

Personally... this looks confusing... perhaps just an Emacs thing, but I couldn't follow what was going on in the video. Or perhaps the yellow background really threw me off.

tarsius · on Oct 1, 2017

The video is indeed... not so great. I would go as far as to say that this might be the Kickstarter campaign with the crappiest video that ever succeeded.

But at some point I just had to launch instead of continuing to hope I would eventually get around to launching the perfect campaign. I did however make an effort to give non-Emacs a better glimpse into how Magit works using a few articles listed at [1]. You might also want to check out some of the screencasts that were created by users [2].

[1] https://emacsair.me/2017/09/01/campaign-articles/#start

[2] https://www.youtube.com/watch?v=rzQEIRRJ2T0 https://magit.vc/screencasts

skrebbel · on Oct 1, 2017

Not quite:

> Magit fully embraces Git. It does not limit itself to a subset of simple features.

(from the Kickstarter you linked to)

luckydude · on Oct 1, 2017

So Git is what Linus thinks is good. What I think is good, and this was Git's inspiration, is at http://bitkeeper.org

It's very different from Git. Git versions the tree, BK versions the tree and versions files. The difference is profound. In BK, you debug by looking at the file history and then zooming out to the commit history. Git can't do that. Or it fakes it and gets it wrong. Same with merges, renames, creates, deletes, Git gets all that wrong.

I know it won and there is no hope that things can get better but if you want to see what better looks like, check out BK.

mappu · on Oct 1, 2017

Hg user here. One interesting hg/git difference is that Hg allows file history across renames by explicitly tracking the rename operation, whereas git just does a fuzzy content match.

corndoge · on Oct 1, 2017

you said git is wrong but don't explain why bk is right

luckydude · on Oct 1, 2017

It's all about what is recorded, git records less information.

Git has no versioned file object, it has one graph for the whole repository, there are no per file graphs. Which means there are no create/delete/rename events, git guesses at that information.

BK has a graph per file in addition to the repository graph. The pathname is an attribute of the file, just like the contents.

The per file graph means the GCA you use when you merge is the correct one for this file, not the one for the repository. The two might be miles apart, so we make merging easier.

One more: it's easy to write (and we did) a bk fast-export and have it work deterministically so that incremental exports work correctly even when done in parallel in trees not at the same place.

We wrote a bk fast-import but we can't get it to work correctly incrementally, git doesn't record enough information.

At this point, I really wish Linus had just copied our tech, the world would have improved. Git is a step, a big step, backwards and we are stuck with it.

kazinator · on Oct 1, 2017

git repos with non-linear history quickly turn to shit.

The only way to use git right is never to allow a commit to have two parents.

Never merge anything: always cherry pick or rebase.

skrebbel · on Oct 1, 2017

I feel like most comments in this thread focus on the process mining, whereas IMO they are missing the core idea of this article:

> Build a tool that provides those, and only those, to users.

The author probably assumes that it is obvious that this means that the UI of such a tool (be it a CLI or a UI) can be simpler. So much simpler, that it in turn can be made much more powerful. So much more powerful, that maybe, just maybe, one day there will be a time where programmers won't have to spend serious effort on learning and doing source control anymore.

reificator · on Oct 1, 2017

I don't think you'll ever not need to learn version control, because it helps you by adding to the list of things you can accomplish.

I do think there's a lot of room for UX improvement such that the learning goes (significantly) faster, but I don't think it will go away.

Also most of what you need to know with git can be taught in an afternoon. If you end up with an ugly history, guess what? History has always been ugly.

nichochar · on Oct 1, 2017

> (To increase the likelihood of your proposal being funded, say that you’re using machine learning rather than statistics.) Aha, so true, so true...

tasuki · on Oct 1, 2017

Git is very simple and elegant. It's only its interface that sucks. I found people who try to use git without understanding its internals often fail.

You can grok git internals quite easily from: https://www.chromium.org/developers/fast-intro-to-git-intern...