Visualizing binaries with space-filling curves (2011)

cortesi · on Oct 18, 2020

Oh, hey - that's my post. I have a blazingly fast re-implementation of these algorithms in Rust, which will soon make an appearance as a set of command-line tools. This will be a companion to the interactive in-browser implementation at https://binvis.io.

If this sort of thing floats your boat, please get in touch. The code is not public yet because it's not ready yet, but I'd welcome like-minded collaborators.

marcodiego · on Oct 18, 2020

Only thing I'd ask is to include the other visualizations available in Battelle's CantorDust so we can use them without needing to load GHidra.

kummappp · on Oct 18, 2020

Also one thing that works is bits to -1, +1 signal and then taking auto-correlation of it.

brzozowski · on Oct 18, 2020

Space filling curves have some super important applications for indexing and information retrieval. I recently stumbled on an fascinating library called Uzaygezen for multidimensional Hilbert space filling curves. Extremely high quality code and documentation:

https://github.com/aioaneid/uzaygezen

https://opensource.googleblog.com/2008/08/uzaygezen-multi-di...

kummappp · on Oct 18, 2020

What a nice way to make 1d -> 2d. I made my MSc thesis about data visualization and the thing I found useful was excess entropy, which means how well you can you predict the next bit better, if you take one bit more into the sliding window you use to predict the next bit/byte. That is usually really dependent about the sliding window size. Imagine what happens with text written in 8 bit characters. With that trick one could make a 3d visualization.

abnry · on Oct 18, 2020

I love this! Years ago I toyed with the idea of trying to map arbitrary files to some sort of 2d "hash" thumbnail image, as people are visually oriented and remember things visually. I wanted there to be some sort of continuity between small changes in the file that shows up immediately visually. This seems to solve those problems!

peteretep · on Oct 18, 2020

I feel like the author goes to some length justifying the use of space filling curves when the zig-zag version seems clearest

cortesi · on Oct 18, 2020

That's quite possible! Though after staring at thousands of these across many application domains, I do think the space-filling curves perform legitimately better.

Another factor is that the large discontinuities in the zig-zag curve means that a contiguous area in the data is not always contiguous in the visualisation, which makes things like the region selection I do for https://binvis.io impossible.

ogogmad · on Oct 18, 2020

Why are there two non-contiguous blue pieces in the Z-order curve that are contiguous in the Hilbert curve? Surely, both curves are continuous.

Maybe it's that the "slash" part of the Z isn't drawn, undermining the continuity. Or maybe the thin slash is overwritten because the curve is not injective.

EdSchouten · on Oct 18, 2020

Exactly. The slash part isn’t drawn, which is why a Z order curve is not considered continuous.

ogogmad · on Oct 18, 2020

> a Z order curve is not considered continuous

Not true. A curve is by definition continuous, and a true Z-order curve is no exception. I was asking whether the Z-order curve displayed there was a true one.

The definition of a space-filling curve is a continuous surjection from [0,1] to [0,1]^2.

Pseudomanifold · on Oct 18, 2020

Back in the days (2014), I did something similar, albeit for network packet visualisation based on `libpcap`: https://github.com/Pseudomanifold/EtherCurve.

pmoriarty · on Oct 18, 2020

Is this process reversible?

That is, given an image generated this way, can you get the original binary back?

If so, it could be useful in glitching audio in interesting ways, using image editing tools.

It would also be interesting to hear particular executable binaries sound when converted to audio from this representation. Would differences in different types of binaries be distinguishable by human audio pattern recognition?

Also useful for this would be maximally permissive image representation requirements for the trip back to binary. Of course, this would be difficult for binaries meant to be executed as code, as arbitrary binaries are unlikely to be executable, but for transformation of images to audio it should be much simpler to ensure that any image makes a valid, playable audio file.

ninjabiker · on Oct 18, 2020

how does this help you compare binaries? I am trying to understand what questions the visual helps you answer that you don’t get from summary statistics like entropy? Also, how does this work for you binaries with very different sizes? This looks neat and I have seen similar concepts like this on other data sets: https://xkcd.com/195/

rckoepke · on Oct 19, 2020

This demo does a fairly decent job of explaining how you'd use these kinds of visualizations:

https://www.youtube.com/watch?v=4bM3Gut1hIk&list=PLUyyOw61zx...

Batelle's CantorDust combines the visualization concept with a convenient UX for selecting blocks of code graphically and zooming in on the corresponding hex, or vice-versa. The "devil is in the details" with respect to the UX for these kinds of tools. The visualization or 2D image by itself is somewhat less useful without being able to snap to the corresponding part of the hex or IDA/Ghidra disassembly.

I do think adding a 3rd dimension to the visualization probably adds somewhat more utility as well. The recently released open source package for CantorDust seems to omit the 3D visualizations which were shown in the demo linked here.