Hacker News new | past | comments | ask | show | jobs | submit login
Better Git diff output for Ruby, Python, Elixir, Go (tekin.co.uk)
302 points by Lammy on Oct 19, 2020 | hide | past | favorite | 51 comments



I tried to figure out why the extensions aren't configured by default.

I found a 2011 patch/proposal to make it the default, which appears to have stranded: https://lore.kernel.org/git/20110825204047.GA9948@sigill.int...

With some discussion by the patch's author of possible downsides (https://lore.kernel.org/git/20110826025913.GC17625@sigill.in...):

> I think it could be a problem in the future if the builtin userdiff drivers started growing more invasive options, like automatically claiming to be non-binary (i.e., setting diff.cpp.binary = false by default). In other words, I think we have two options:

> 1. Builtin drivers like "cpp" can stay minimal, only setting funcname and color-words headers that aren't going to produce terrible results if we are wrong about detecting by extension.

> 2. We force the user to identify file types manually, so we can't be wrong. The "cpp" diff driver means "you are a text C file", and if a user mis-marks a binary file with that diff driver, they are the one who is wrong.

> So if it's an either/or situation, we should decide not only that extension auto-detection is a good feature, but that it trumps adding more advanced features to the builtin drivers in the future.

> Or we could decide that the extensions really are good enough, and if you really do have binary files named "foo.c", it's your problem to override the defaults with "*.c -diff".

There might be more recent discussion that I didn't find.


I rarely see files with incorrect file extensions outside of collisions and user error?

I don’t see any downside apart from extensionless files like shell scripts with the executable flag set, shebang, and no file extension which could also be solved.

Edit: and if comprehensive correctness was truly desired, file magic could make a better guess, but that would be overkill imo.


What's a '.m' file? Matlab, Objective-C, or Mathematica?


Whichever is most common. Alternatively, use file magic to determine type in cases of collision, or do nothing (75+% of devs wouldn’t use these).


It’s an archive hiding in plain sight.


The idea of everyone configuring this individually instead of just being the default behavior in git reminds me a lot of this comment:

https://news.ycombinator.com/item?id=24136477


I use delta as a diff tool:

https://github.com/dandavison/delta

It also offers contextual information as well as side-by-side diffs. For syntax highlighting, it uses the same as bat (the cat clone).


This looks like a great command line alternative to git diff or diff, but I have to admit that I greatly prefer meld when I need a quick diff against non version controlled files, or sublime merge when using revision controlled files.


still using Sublime Merge? I never used it, wondering if you still like it


I use Sublime Merge pretty much every day, and it's nothing short of fantastic.

It's properly fast, the diff/stage/commit/etc UX is excellent. I particularly enjoy how straightforward it makes it to hand-pick lines from multiple files really quickly. The interface abstracts over enough of the Git ux/api to make it get out of your way, but in a way that doesn't make what it's doing a mystery or too out of your control - hovering over a button will often yield the equivalent git cli commands for example.

Personal preference here, but it's also not an electron app, so it's very resource light.


Worth pointing out that this is not mutually exclusive with the article. You can use both, the article (when combined with this pager) will affect what appears inside the chunk header.


Insane that I never tried looking for different diff tools, thanks for the recommendation!


I'd say 95% of the new tools I try come from HN recommendations, I hope we don't start seeing interlopers trying to game it.


FWIW, it looked great on paper/Github, so I installed and tried it. The colors just overwhelmed me, so I had to uninstall it.

Thinking "perhaps, it's just the default color scheme", I installed it again and tried the `delta --show-syntax-themes` command to see the themes in action. Didn't like any of them. So uninstalled it again.

I like my current setup, which is:

[core] pager = less -FMRiXx4


(Delta author here.) You can disable syntax highlighting, either by customizing the {plus,minus}-*-style options to not use 'syntax' as the foreground color, or by selecting the diff-highlight or diff-so-fancy emulation modes. That would allow you to still have some of the other features if they are attractive to you, such as side-by-side view, line numbers, restructuring and streamlining of the default diff format output, copyable code (no +/- characters), etc.


Thanks, I just tried this, it is awesome!


From the title I was hoping that it was going to do syntax-aware alignment of diffs, but alas no.

I've been using kdiff3 even though it hasn't been updated in a very long time because it has one killer feature: manual diff alignment. I can select a token in both (or all three) files and then force the diff to align at that point. It makes merges a whole lot easier. Sometimes, a single realign is all that's needed for a file merge to sort itself out. Even when that's not the case, it's easier to reason about the changes.

Did I miss something? That is, is that feature available in other diff/merge tools and I just haven't seen it?


Beyond Compare [0] has manual alignment. It works within files as well as within directories, allowing you to match up files with different names when comparing directories.

I don't think it has syntax-aware alignment, though.

[0]: https://www.scootersoftware.com/


Yeah me too. I actually had to go back and read the text to figure out what improvement had been made because it looked identical.

I was expecting something like MergeResolver which diffs Javascript ASTs:

https://mergeresolver.github.io/


P4merge is my go-to tool for diffs and merges.


> Did I miss something?

Sure

https://download.kde.org/stable/kdiff3/


Aww, you got my hopes up but that looks like KDE-only fork of the original, which lives here: http://kdiff3.sourceforge.net/


https://github.com/KDE/kdiff3

In theory there's newer builds of kdiff3 but I've not had any luck running the Windows versions & I don't think there's been significant improvements to the manual alignment code which still require a lot of battling to get the right outcome.


Kinda disappointed that the old Qt4 KDiff3 setup for Windows is 11 MB, but the KDE Qt5 KDiff3 setup is 51 MB.


meld has manual alignment


No need for:

  git config --global core.attributesfile ~/.gitattributes
Git looks for your personal attributes file by default in ~/.config/git/attributes (well, $XDG_CONFIG_HOME/git/attributes to be precise), so if you put it there you don't have to set the config option.


Diffs are still pretty bad at preserving context, especially the “greedy” diff algorithm that assumes that files are identical for as long as possible instead of preserving as much context around the change as possible (if the diff is equally large. This has the familiar effect where you add a method under an existing method and they both have the same documentation header prefix line such as /, then the diff will make a mess of it and not see your added method as a single chunk of text.

Instead of a regex per file type perhaps a better diff tool could actually diff the parsed code (if possible) and produce a more sensible diff and context?


Agreed that AST diff is better, but there's also just better diff algorithms like the lesser-known Tichy diff: https://www.researchgate.net/publication/220439403_The_Strin... that I learned from http://bryanpendleton.blogspot.com/2010/04/more-study-of-dif... ; it seems to preserve more context and is better suited for big code refactors.


Wouldn’t the optimal solution be both?

Though, being able to refactor code, and assert that it is logically and functionally equivalent as the previous checkin would be super useful, even if there’s some gaps (sting value or comparison changes could not be asserted, for example)


A much bigger set of improvements to the git diff output is also easy to add: https://github.com/so-fancy/diff-so-fancy

This makes your git diff highlight the actual different characters between two lines, makes file names easy to see (and easy to copy, due to no a/ b/), makes renames and blank lines clear and readable.

It is so nice to get all this without a separate GUI diff visualiser when doing anything git-related from the command line.


Wow this is insanity. I thought these would all be the default. I recently introduced .gitattributes into our work codebase, but that was to improve GitHub's ability to hide generated files from PRs:

     # package lockfiles
     poetry.lock -diff linguist-generated=true
     go.sum -diff linguist-generated=true
     yarn.lock -diff linguist-generated=true
     Cargo.lock -diff linguist-generated=true

     # generated protobufs
     *pb2.py -diff linguist-generated=true
     *pb2.pyi -diff linguist-generated=true
     *pb2_grpc.py -diff linguist-generated=true
     *pb2_grpc.pyi -diff linguist-generated=true
     *.pb.go -diff linguist-generated=true


A bigger problem is tree structures, it is particularly egregious for HTML and JSON where the output tend to be a complete mess of red and green. I think a good solution would be for git to call out to the respective language servers (the VS Language Server Protocol) and render changes based on AST and syntax specific diffs.


If you keep each entity on a new line in both HTML and JSON it's actually quite correct.


Which HTML parsers make an AST available? I can only think of Pandoc.


https://github.com/homeport/dyff yet another pretty nice option.


This is hugely helpful. I always have trouble scanning the git diffs because the first line of each hunk is often the same, causing me to take longer to parse the context.

This should speed up my adds, and cut down on accidental stagings.


I’ve been using git since a year or two after it was released and I’ve never paid any attention to that part of the diff. I could see it that preventing some editing of the wrong method but that is something testing should reveal.

I’m curious if others see some value in it, either before or with changes from this post?


Am I missing something? Those diffs look identical.

Edit: got it. Thanks!


The header that indicates context. Default:

  @@ -24,7 +24,7 @@ class TicketPdf
Configured:

  @@ -24,7 +24,7 @@ def tickets_as_html


Ah, thanks. I scanned those so many times but I never caught it. I think I'm used to not reading the header because, as this points out, it isn't always useful.


I use these a lot when someone sends a patch, I'll use `vim -t <function>` to jump to the definition of <function> to better understand the change. (Where <function> is from the header of the hunk.)


In the first diff after the @@ there's a reference to the enclosing class, in the second diff it's a reference to the enclosing method. Which could help perhaps if the methods are moved?


The second one shows the changed lines' context as a method definition "def tickets_as_html" instead of the mostly-useless "class TicketPdf".


What this really needed was a diff for their own example


The second diff identifies the target of the changes more precisely - method vs class


Is there a good utility for diff-ing JSON content? I am calling jq to sort keys first and then let the default textual diff highlight the changes, but that's neither convenient nor complete.


Graphtage is a command line utility and underlying library for semantically comparing and merging tree-like structures such as JSON, JSON5, XML, HTML, YAML, and TOML files.

https://blog.trailofbits.com/2020/08/28/graphtage/


I've been doing similarly and have had pretty good success with that (https://www.jvt.me/posts/2020/08/24/pretty-print-json-diff/)


speaking of git tricks. `git grep` can take a file pattern and will only search in files matching that pattern, and it's way faster than grep -R -- about on par with `ag`, the SilverSearcher.


This could (and should) be written as a proper AST diff, rather than a presentation based diff with some mild smarts:

      assigns: { 
        tickets: tickets, 
        ++ event_name: event_name 
      }


I've been using git forever, and never even noticed that hunks get a class identifier. I pretty much have only been diffing Python. I guess it's so useless I never even noticed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: