Better Git diff output for Ruby, Python, Elixir, Go

duckerude · on Oct 19, 2020

I tried to figure out why the extensions aren't configured by default.

I found a 2011 patch/proposal to make it the default, which appears to have stranded: https://lore.kernel.org/git/20110825204047.GA9948@sigill.int...

With some discussion by the patch's author of possible downsides (https://lore.kernel.org/git/20110826025913.GC17625@sigill.in...):

> I think it could be a problem in the future if the builtin userdiff drivers started growing more invasive options, like automatically claiming to be non-binary (i.e., setting diff.cpp.binary = false by default). In other words, I think we have two options:

> 1. Builtin drivers like "cpp" can stay minimal, only setting funcname and color-words headers that aren't going to produce terrible results if we are wrong about detecting by extension.

> 2. We force the user to identify file types manually, so we can't be wrong. The "cpp" diff driver means "you are a text C file", and if a user mis-marks a binary file with that diff driver, they are the one who is wrong.

> So if it's an either/or situation, we should decide not only that extension auto-detection is a good feature, but that it trumps adding more advanced features to the builtin drivers in the future.

> Or we could decide that the extensions really are good enough, and if you really do have binary files named "foo.c", it's your problem to override the defaults with "*.c -diff".

There might be more recent discussion that I didn't find.

hsbauauvhabzb · on Oct 19, 2020

I rarely see files with incorrect file extensions outside of collisions and user error?

I don’t see any downside apart from extensionless files like shell scripts with the executable flag set, shebang, and no file extension which could also be solved.

Edit: and if comprehensive correctness was truly desired, file magic could make a better guess, but that would be overkill imo.

ori_b · on Oct 20, 2020

What's a '.m' file? Matlab, Objective-C, or Mathematica?

hsbauauvhabzb · on Oct 20, 2020

Whichever is most common. Alternatively, use file magic to determine type in cases of collision, or do nothing (75+% of devs wouldn’t use these).

staticautomatic · on Oct 20, 2020

It’s an archive hiding in plain sight.

myroon5 · on Oct 19, 2020

The idea of everyone configuring this individually instead of just being the default behavior in git reminds me a lot of this comment:

https://news.ycombinator.com/item?id=24136477

nikeee · on Oct 19, 2020

I use delta as a diff tool:

https://github.com/dandavison/delta

It also offers contextual information as well as side-by-side diffs. For syntax highlighting, it uses the same as bat (the cat clone).

hak8or · on Oct 19, 2020

This looks like a great command line alternative to git diff or diff, but I have to admit that I greatly prefer meld when I need a quick diff against non version controlled files, or sublime merge when using revision controlled files.

arthurcolle · on Oct 19, 2020

still using Sublime Merge? I never used it, wondering if you still like it

FridgeSeal · on Oct 19, 2020

I use Sublime Merge pretty much every day, and it's nothing short of fantastic.

It's properly fast, the diff/stage/commit/etc UX is excellent. I particularly enjoy how straightforward it makes it to hand-pick lines from multiple files really quickly. The interface abstracts over enough of the Git ux/api to make it get out of your way, but in a way that doesn't make what it's doing a mystery or too out of your control - hovering over a button will often yield the equivalent git cli commands for example.

Personal preference here, but it's also not an electron app, so it's very resource light.

CGamesPlay · on Oct 20, 2020

Worth pointing out that this is not mutually exclusive with the article. You can use both, the article (when combined with this pager) will affect what appears inside the chunk header.

vikiomega9 · on Oct 19, 2020

Insane that I never tried looking for different diff tools, thanks for the recommendation!

brigandish · on Oct 19, 2020

I'd say 95% of the new tools I try come from HN recommendations, I hope we don't start seeing interlopers trying to game it.

gurjeet · on Oct 20, 2020

FWIW, it looked great on paper/Github, so I installed and tried it. The colors just overwhelmed me, so I had to uninstall it.

Thinking "perhaps, it's just the default color scheme", I installed it again and tried the `delta --show-syntax-themes` command to see the themes in action. Didn't like any of them. So uninstalled it again.

I like my current setup, which is:

[core] pager = less -FMRiXx4

Myrmornis · on Oct 20, 2020

(Delta author here.) You can disable syntax highlighting, either by customizing the {plus,minus}-*-style options to not use 'syntax' as the foreground color, or by selecting the diff-highlight or diff-so-fancy emulation modes. That would allow you to still have some of the other features if they are attractive to you, such as side-by-side view, line numbers, restructuring and streamlining of the default diff format output, copyable code (no +/- characters), etc.

nxpnsv · on Oct 19, 2020

Thanks, I just tried this, it is awesome!

joncp · on Oct 19, 2020

From the title I was hoping that it was going to do syntax-aware alignment of diffs, but alas no.

I've been using kdiff3 even though it hasn't been updated in a very long time because it has one killer feature: manual diff alignment. I can select a token in both (or all three) files and then force the diff to align at that point. It makes merges a whole lot easier. Sometimes, a single realign is all that's needed for a file merge to sort itself out. Even when that's not the case, it's easier to reason about the changes.

Did I miss something? That is, is that feature available in other diff/merge tools and I just haven't seen it?

mmebane · on Oct 19, 2020

Beyond Compare [0] has manual alignment. It works within files as well as within directories, allowing you to match up files with different names when comparing directories.

I don't think it has syntax-aware alignment, though.

[0]: https://www.scootersoftware.com/

IshKebab · on Oct 19, 2020

Yeah me too. I actually had to go back and read the text to figure out what improvement had been made because it looked identical.

I was expecting something like MergeResolver which diffs Javascript ASTs:

https://mergeresolver.github.io/

secondcoming · on Oct 19, 2020

P4merge is my go-to tool for diffs and merges.

Iwan-Zotow · on Oct 19, 2020

> Did I miss something?

Sure

https://download.kde.org/stable/kdiff3/

joncp · on Oct 19, 2020

Aww, you got my hopes up but that looks like KDE-only fork of the original, which lives here: http://kdiff3.sourceforge.net/

vlovich123 · on Oct 19, 2020

https://github.com/KDE/kdiff3

In theory there's newer builds of kdiff3 but I've not had any luck running the Windows versions & I don't think there's been significant improvements to the manual alignment code which still require a lot of battling to get the right outcome.

nyanpasu64 · on Oct 20, 2020

Kinda disappointed that the old Qt4 KDiff3 setup for Windows is 11 MB, but the KDE Qt5 KDiff3 setup is 51 MB.

usr1106 · on Oct 19, 2020

meld has manual alignment

rjmorris · on Oct 19, 2020

No need for:

  git config --global core.attributesfile ~/.gitattributes

Git looks for your personal attributes file by default in ~/.config/git/attributes (well, $XDG_CONFIG_HOME/git/attributes to be precise), so if you put it there you don't have to set the config option.

alkonaut · on Oct 19, 2020

Diffs are still pretty bad at preserving context, especially the “greedy” diff algorithm that assumes that files are identical for as long as possible instead of preserving as much context around the change as possible (if the diff is equally large. This has the familiar effect where you add a method under an existing method and they both have the same documentation header prefix line such as /, then the diff will make a mess of it and not see your added method as a single chunk of text.

Instead of a regex per file type perhaps a better diff tool could actually diff the parsed code (if possible) and produce a more sensible diff and context?

teabee89 · on Oct 19, 2020

Agreed that AST diff is better, but there's also just better diff algorithms like the lesser-known Tichy diff: https://www.researchgate.net/publication/220439403_The_Strin... that I learned from http://bryanpendleton.blogspot.com/2010/04/more-study-of-dif... ; it seems to preserve more context and is better suited for big code refactors.

hsbauauvhabzb · on Oct 20, 2020

Wouldn’t the optimal solution be both?

Though, being able to refactor code, and assert that it is logically and functionally equivalent as the previous checkin would be super useful, even if there’s some gaps (sting value or comparison changes could not be asserted, for example)

unholiness · on Oct 20, 2020

A much bigger set of improvements to the git diff output is also easy to add: https://github.com/so-fancy/diff-so-fancy

This makes your git diff highlight the actual different characters between two lines, makes file names easy to see (and easy to copy, due to no a/ b/), makes renames and blank lines clear and readable.

It is so nice to get all this without a separate GUI diff visualiser when doing anything git-related from the command line.

jzelinskie · on Oct 20, 2020

Wow this is insanity. I thought these would all be the default. I recently introduced .gitattributes into our work codebase, but that was to improve GitHub's ability to hide generated files from PRs:

     # package lockfiles
     poetry.lock -diff linguist-generated=true
     go.sum -diff linguist-generated=true
     yarn.lock -diff linguist-generated=true
     Cargo.lock -diff linguist-generated=true

     # generated protobufs
     *pb2.py -diff linguist-generated=true
     *pb2.pyi -diff linguist-generated=true
     *pb2_grpc.py -diff linguist-generated=true
     *pb2_grpc.pyi -diff linguist-generated=true
     *.pb.go -diff linguist-generated=true

narrationbox · on Oct 19, 2020

A bigger problem is tree structures, it is particularly egregious for HTML and JSON where the output tend to be a complete mess of red and green. I think a good solution would be for git to call out to the respective language servers (the VS Language Server Protocol) and render changes based on AST and syntax specific diffs.

heavenlyblue · on Oct 20, 2020

If you keep each entity on a new line in both HTML and JSON it's actually quite correct.

mr_toad · on Oct 20, 2020

Which HTML parsers make an AST available? I can only think of Pandoc.

ite07 · on Oct 19, 2020

https://github.com/homeport/dyff yet another pretty nice option.

crazydoggers · on Oct 19, 2020

This is hugely helpful. I always have trouble scanning the git diffs because the first line of each hunk is often the same, causing me to take longer to parse the context.

This should speed up my adds, and cut down on accidental stagings.

tvon · on Oct 20, 2020

I’ve been using git since a year or two after it was released and I’ve never paid any attention to that part of the diff. I could see it that preventing some editing of the wrong method but that is something testing should reveal.

I’m curious if others see some value in it, either before or with changes from this post?

duhi88 · on Oct 19, 2020

Am I missing something? Those diffs look identical.

Edit: got it. Thanks!

duckerude · on Oct 19, 2020

The header that indicates context. Default:

  @@ -24,7 +24,7 @@ class TicketPdf

Configured:

  @@ -24,7 +24,7 @@ def tickets_as_html

duhi88 · on Oct 19, 2020

Ah, thanks. I scanned those so many times but I never caught it. I think I'm used to not reading the header because, as this points out, it isn't always useful.

ndesaulniers · on Oct 19, 2020

I use these a lot when someone sends a patch, I'll use `vim -t <function>` to jump to the definition of <function> to better understand the change. (Where <function> is from the header of the hunk.)

renox · on Oct 19, 2020

In the first diff after the @@ there's a reference to the enclosing class, in the second diff it's a reference to the enclosing method. Which could help perhaps if the methods are moved?

Lammy · on Oct 19, 2020

The second one shows the changed lines' context as a method definition "def tickets_as_html" instead of the mostly-useless "class TicketPdf".

swrobel · on Oct 19, 2020

What this really needed was a diff for their own example

ngcazz · on Oct 19, 2020

The second diff identifies the target of the changes more precisely - method vs class

rzzzt · on Oct 19, 2020

Is there a good utility for diff-ing JSON content? I am calling jq to sort keys first and then let the default textual diff highlight the changes, but that's neither convenient nor complete.

stormy · on Oct 19, 2020

Graphtage is a command line utility and underlying library for semantically comparing and merging tree-like structures such as JSON, JSON5, XML, HTML, YAML, and TOML files.

https://blog.trailofbits.com/2020/08/28/graphtage/

jamietanna · on Oct 19, 2020

I've been doing similarly and have had pretty good success with that (https://www.jvt.me/posts/2020/08/24/pretty-print-json-diff/)

cratermoon · on Oct 20, 2020

speaking of git tricks. `git grep` can take a file pattern and will only search in files matching that pattern, and it's way faster than grep -R -- about on par with `ag`, the SilverSearcher.

nailer · on Oct 20, 2020

This could (and should) be written as a proper AST diff, rather than a presentation based diff with some mild smarts:

      assigns: { 
        tickets: tickets, 
        ++ event_name: event_name 
      }

jedberg · on Oct 20, 2020

I've been using git forever, and never even noticed that hunks get a class identifier. I pretty much have only been diffing Python. I guess it's so useless I never even noticed!