Graphtage: A New Semantic Diffing Tool

nerdponx · on Aug 29, 2020

Hideous screenshots aside, Graphtage itself looks very useful. Can it generate Git-compatible diffs for use as a Git difftool?

Also-- for "standalone" tools like this that are written in Python, I highly recommend Pipx for installing them: https://pipxproject.github.io/pipx/. It installs each tool into a separate self-contained virtual environment and symlinks the executable itself to a "bin" directory, which prevents tools with different dependencies from conflicting.

ivan_ah · on Aug 29, 2020

This is very interesting and a much needed tool. I have been searching for a tool like this for a long time. There are so many tree-like structures that I'm sure there will be interesting use cases...

I was recently working on a similar tool[1] but specific to the domain of "content trees" that consist of content nodes organized into a hierarchical structure. In my case each tree node has a persistent `content_id` associate with the underlying content file and independent of its position within the tree, which allows me to detect "move" operations[2] (a node with the same `content_id` appearing in a different place in the tree).

The use case is for educational content: Kolibri channels[3] are these huge trees that consist of thousands of nodes and it's difficult to know what has changed when we create new versions of the channels. I tried all kinds of general-purpose diffing tools and failed miserably so I started working on treediffer. It's almost done; I hope to finish it later this fall, and will look at graphtage to see how it works.

[1] https://github.com/learningequality/treediffer [2] https://treediffer.readthedocs.io/en/latest/diff_formats.htm... [3] https://kolibri-demo.learningequality.org/en/learn/#/topics

lewisjoe · on Aug 29, 2020

Has anybody went through React's HTML diffing algorithm? If this one's good, we could write a JS version and use it for HTML diffing in browsers.

brunoqc · on Aug 29, 2020

Graphtage could be compiled to wasm and used in a browser.

hinkley · on Aug 29, 2020

I was staring at a diff today and longing for better semantic diffing.

I’d changed a shell script, with a chain of commands. I added a second call to the same command with different args and the diff was just... bad.

    something && fizz foo && another

    something && fizz bar && fizz foo && another

It decided that “bar && fizz” was my edit, and I just stared at it (it was already a tough day). Even if they had just weighted punctuation characters differently, it would have gotten the right answer, as it would with adding new functions or array entries, which it always gets wrong too.

Sort it out please.

tingletech · on Aug 29, 2020

interesting "This tool was partially developed with funding from the Defense Advanced Research Projects Agency (DARPA) on the SafeDocs project."

I like the idea that it can do semantic diffs across different formats.

setpatchaddress · on Aug 29, 2020

I would recommend deleting the screenshots, though. I looked at them and thought "so what? that's been done many times before" until I read the text more carefully.

hinkley · on Aug 29, 2020

I would recommend reshooting the screenshots. Navy blue on a jet black background? Removing new lines in the initial example but not in the diffs? Fixing those would get the point across better.

Also, turn the saturation down. That’s the greenest green and the reddest red next to the darkest blue. My eyes.

throwaway_pdp09 · on Aug 29, 2020

I can't see a problem - there's no pic. I guess they need JS to show images.

Back on point, I see so much of this grey-on-grey type thing, just a little common sense would suggest it's very poor practice but it keeps happening.

hinkley · on Aug 29, 2020

Exhibit A:

https://i1.wp.com/blog.trailofbits.com/wp-content/uploads/20...

throwaway_pdp09 · on Aug 29, 2020

First thought was you'd given me a nethack screenshot by accident, but thanks! Interesting project.

sendbits · on Aug 29, 2020

super cool, having worked on related problems independently (tree-based file compression & arbitrary graph-based file compares) and currently been in search of better way to compare web scrapes over time

kudos for putting the two concepts together / will give it a go

anotheryou · on Aug 29, 2020

I want one that can also find non-perfectly matching moved lines :)

looks cool already though, got to try it some time.

idubrov · on Aug 29, 2020

At my previous job I've built a tool that was capable of doing that (we were merging XMLs with form definitions). The main idea was an interactive mode.

Initially, tool would merge based on series of heuristics and then user would manually adjust "matching" nodes (user could say "actually, this A on the left and B on the right are the same, it's just that it was heavily modified").

hinkley · on Aug 29, 2020

It seems like if the editor produced hints this would work better, but your target audience also shrinks.