Hacker News new | past | comments | ask | show | jobs | submit login
Hexyl: A command-line hex viewer with colorized output (github.com/sharkdp)
212 points by hundredwatt on June 16, 2023 | hide | past | favorite | 89 comments



Sharkdp is also responsible for some fantastic tools like hyperfine for micro benchmarking, fd for finding files, bat which is a colorized cat replacement and many more such utilities. https://github.com/sharkdp is a real treasure trove of user-friendly CLI tools.


burntsushi and sharkdp are the goats.


And both are members of HN!


Here's mine, from 1986 or so. I still use it all the time:

    #include <stdio.h>
    #define O1O printf
    #define OlO putchar
    #define O10 exit
    #define Ol0 strlen
    #define QLQ fopen
    #define OlQ fgetc
    #define O1Q abs
    #define QO0 for
    typedef char lOL;

    lOL*QI[] = {"Use:\012\011dump file\012","Unable to open file '\x25s'\012",
     "\012","   ",""};

    main(I,Il)
    lOL*Il[];
    {   FILE *L;
            unsigned lO;
            int Q,OL[' '^'0'],llO = EOF,

            O=1,l=0,lll=O+O+O+l,OQ=056;
            lOL*llL="%2x ";
            (I != 1<<1&&(O1O(QI[0]),O10(1011-1010))),
            ((L = QLQ(Il[O],"r"))==0&&(O1O(QI[O],Il[O]),O10(O)));
            lO = I-(O<<l<<O);
            while (L-l,1)
            {   QO0(Q = 0L;((Q &~(0x10-O))== l);
                            OL[Q++] = OlQ(L));
                    if (OL[0]==llO) break;
                    O1O("\0454x: ",lO);
                    if (I == (1<<1))
                    {   QO0(Q=Ol0(QI[O<<O<<1]);Q<Ol0(QI[0]);
                            Q++)O1O((OL[Q]!=llO)?llL:QI[lll],OL[Q]);/*"
                            O10(QI[1O])*/
                            O1O(QI[lll]);{}
                    }
                    QO0 (Q=0L;Q<1<<1<<1<<1<<1;Q+=Q<0100)
                    {   (OL[Q]!=llO)? /* 0010 10lOQ 000LQL */
                            ((D(OL[Q])==0&&(*(OL+O1Q(Q-l))=OQ)),
                            OlO(OL[Q])):
                            OlO(1<<(1<<1<<1)<<1);
                    }
                    O1O(QI[01^10^9]);
                    lO+=Q+0+l;}
            }
            D(l) { return l>=' '&&l<='\~';
    }
I do like the colorized output of hexyl, though.


Was this for an obfuscated C challenge?


Here's how Claude+[1] explained it:

This program appears to be a hexadecimal dump utility. It does the following:

- It takes a filename as a command line argument and opens that file for reading

- It reads the file byte by byte until EOF

- For each byte, it prints the hexadecimal value of the byte, in the format "%2x " (i.e. 2 hex digits, a space)

- After every 16 bytes, it prints the ASCII representation of those bytes, replacing non-printable characters with "."

- It also has some obfuscated logic with bitwise operations, likely attempting to confuse the reader.

So if you ran it like this:

  ./program myfile.txt
It would output something like:

  54 65 78 74 20 66 69 
  6c 65 2e 0a 54 68 69 
  73 20 69 73 20 61 20 
  74 65 78 74 20 66 69 
  6c 65 2e 0a 54 68 65 
  20 71 75 69 63 6b 20 
  62 72 6f 77 6e 20 66  
  6f 78 0a 6a 75 6d 70 
  73 20 6f 76 65 72 20
  20 74 68 65 20 6c 61
  7a 79 20 64 6f 67 0a 
  2e 2e 2e
Which is the hexadecimal dump of the ASCII contents of myfile.txt.

The #defines are used to obfuscate the code and make it harder to read, replacing printf with O1O, putchar with OlO, etc. The D() function is used to check if a byte is a printable ASCII character.

So in summary, this program opens a file, reads it byte by byte, prints the hex values, and prints the ASCII for printable characters, as a hexadecimal dump utility.

[1] - https://poe.com/Claude%2B


I'm well aware of how the code works. I was mostly interested in why it was written like that. nwiswell's comment[1] gave me the hint I needed to find that yes, this was part of an obfuscated C challenge[2].

I know you mean well but LLMs are the very last resource I'd turn to for help. Those things make crap up all the time.

[1]: https://news.ycombinator.com/item?id=36364283

[2]: https://www.ioccc.org/1986/bright/bright.c


What a silly reply. It's very impressive that the other person's LLM was able to figure this out, even if you were clever enough you know it already


> It also has some obfuscated logic with bitwise operations, likely attempting to confuse the reader.

Bitwise operations are commonplace to improve efficiency.


After seeing an LLM do something like this, I've got to ask people who think that LLMs are just "stochastic parrots", "just predicting the next word", or are merely "a blurry jpeg of the web" to think about what's really going on here.


This exact code appears online as part of the IOCCC 1986 (it was a submission), so it's likely that this was indeed part of the training set for this LLM and that there is a significant corpus of text discussing this particular program and other obfuscated programs like it.

I'm not ruling out that this LLM output is "partially organic" rather than "fully regurgitated", but I'd be much more interested to see this LLM explain an obfuscated program that hasn't been floating around the Internet for 35 years.


Even if it's part of the training data and the LLM is just a better search engine, how would I have figured out what the code does without an LLM? I certainly can't paste this into Google.

I mostly agree with the stochastic parrot interpretation, but that doesn't undermine the usefulness or impressiveness. Even if it's just a highly compressed search index, that level of compression is amazing.


> how would I have figured out what the code does without an LLM

Start by find-and-replacing those #defines. You can iteratively deobfuscate things by hand. It's PITA and takes time, but it's doable.

If you hit a roadblock, run it in a VM.


It's easy to test it with something unpublished


My experience is that ChatGPT does a very poor job writing Brainfuck programs for me, even simple programs like "add two and two" aren't correct. Maybe it would do better if I asked it to explain one instead.


In my experience, LLMs are poor at working with unpopular languages -- probably because their training data does not contain a lot of examples of programs written in those languages, or explanations of them.

They do much better with popular languages.


> They do much better with popular languages.

So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?

The more popular the language the more likely the training corpus includes both very similar code samples and explanation of those code samples, and also the more likely those two converge on a “reasonable” explanation.

Ask it something it’s likely to have seen an answer for and it’s likely to spit out that answer… interesting? Sure, impressive? Maybe… but still pretty well captured by “a fuzzy jpeg of the web”.


"So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?"

Or exactly like you'd expect a human to perform.

Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.


> Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.

Ahh, but ask a human a question in a language they don’t understand and they’ll look at you with bewilderment, not confidently make up a stream of hallucinatory nonsense that only vaguely looks statistically right.

> Or exactly like you’d expect a human to perform.

Not exactly, no… but with just enough of the uncanny valley to make me think the more interesting thought: are we really not much more than stochastic parrots? Or, in other words, are we naturally just slightly more interesting than today’s state of the artificially stupid?


It didn't even answer the question though...


Funny thing is GPT4's explanation mentions two times that the code is "highly" or "heavily" obfuscated. Poe's does not.

https://sharegpt.com/c/G37AW91


> likely attempting to confuse the reader

It's just ordinary C code.


Very obviously not.

Nobody substitutes random three letter strings for keywords in ordinary C code unless they intend on some trivial obfuscation.

    define O1O printf
    #define OlO putchar
    #define O10 exit
    #define Ol0 strlen
    #define QLQ fopen
    #define OlQ fgetc
    #define O1Q abs
    #define QO0 for
    typedef char lOL;


I've seen code like this (3 letter macros for every one and a half(!) syntax construct, all macros starting in Q, seemingly random indentation). I do not understand why the developer did that or why the company let him. Just that at the end he didn't understand his own code anymore and couldn't fix some issues.


Not random, but permutations of easily confusable characters (0OQ, 1l).


In many fonts, they are nearly indistinguishable. For example, the default font of putty for { and ( look identical to me, leading to many syntax errors in my code.


Did you write the program without the intention of being obfuscatory, and then _later_ submit it to IOCCC? Or did somebody else submit your code?

It's right here, and your name is on it. https://www.ioccc.org/1986/bright/bright.c

It even won an award! https://www.ioccc.org/1986/bright/hint.html


> Did you write the program without the intention of being obfuscatory

I cribbed it from system .h files.


What have you done..!?


If you want a true Hex Editor (or better, Hex IDE), I strongly suggest you to take a look ad ImHex [1].

[1]: https://github.com/WerWolv/ImHex


ImHex indeed looks very featureful and has a pleasant layout, but it's hard not to recognize the perverseness of a the slogan "A Hex Editor" built for those "who value their retinas when working at 3 AM" being applied to an app centered around putting text on the screen but that insists on tiny font sizes and no anti-aliasing...


It looks amazing, but I couldn't get it to run. Binary was linked to a lib that's not on my system, and when compiled from source it just crashes with a bad cast.


Was looking at hexdump alternatives a week ago and between huxdemp (hxd), xxd, hx, hd (not the alias for hexdump -C) and hexdump this was my favourite.

hx wasn't usable when parsing data streams... And clashes with helix (hx)

On systems where I can't install anything I would just use hexdump -C or xxd


I've created an xxd clone called xd: https://github.com/denysvitali/hexutils

It just uses the same visualization but highlights the ASCII chars


Hexyl's lovely and I copied its style for a hex editor.

I don't understand why there are so many terminal hex viewers that don't edit.


What's your hex editor?


Not OP, but I like hexcurse[1].

https://github.com/LonnyGomes/hexcurse


Sounds good, but screenshot's broken for me.


You should have been pointed to https://github.com/prso/hexcurse who forked it on the grounds that the original has broken screenshots and doesn't compile.


I'd also like to know.


It's a weird one https://github.com/jbaber/edhex or "cargo install edhex".


Hexyl is what I love to use for quickly looking at a file. For slightly more serious introspection I use Emacs' hexl-mode, which also has basic editing capabilities.


Why would you need a separate binary viewer / editor?

This reminds me the absurd, but somehow common situation where Java programmers in my office used IntelliJ IDEA to write Java, but Notepad++ to view logs or edit INI files etc.


Because people sometimes write parsers for binary file formats? In my case, it is indeed for a programming task, but not for debugging running/compiled code.

A ”raw bytes” view can be useful in many situations.


Because nothing is more awesome than wanting to quickly view a file and vscode decides it needs to not only update itself but the remote helper app as well, so your file takes 30 seconds to load.


i do this. big heavy ide with project and debugger going full blast. I might want to just pop open a log quickly and cba to drag it into the IDE, np++ context menu done. more often tho, reading logs on a non-main dev machine/vm/test/field


You might like lnav on the far end, especially for colorization.


Your life is pain...

I don't know if you know the feeling: it's like watching inexperienced chess players play -- you want to scream when you see them making dumb moves, but then the opponent also makes a dumb move, which makes it more funny than sad.

When I look over the shoulder of people who work like you describe your workflow, just like with chess, I feel this kind of mix of embarrassment and frustration -- it was so easy to do it right, yet you decided to do something silly instead.


This is not helpful.

Why not suggest a more appropriate workflow without the chastising or denigration? C'mon, you can be supportive.


shrug, im drastically faster than all my work mates not that thats a good benchmark, im not bothered and have stuff that works quite well for me. what i outlined above is worst case scenario with some huge IDEs like VS or a huge idea project. some projects i do entirely in vim. others i have to use 2 remote debuggers at once with a VM in the mix. im doubtful their advice would apply


Yeah, it's kind of how it happens. Your skills suck just a tiny bit less than of those around you, and you come to unwarranted conclusion that you are doing good.

Unfortunately, our industry has few small pockets where you can still find competent programmers every now and then, but you wouldn't see them using IDEA or VS Code. And then there's an ocean of... well, not even mediocrity, it's just hands down awful. Most programmers throughout their career will never see a competent programmer, and if they stay around will be promoted into management where they will lose the very modest skill they had. And the cycle will continue.


I am not sure what kinda beef you have with people being able to handle using multiple editors at once but it sure is a doozy


I don't want to support people who don't want to support themselves. Also, the commenter wasn't asking for support. They proudly believe that what they do is a legitimately good way to use their computer.

I've seen way too many people like that, and have an idea how that kind of belief is formed. I don't want to help those people. I want them gone.


> I don't want to help those people. I want them gone.

I'm sorry you have this attitude towards people. I wish you the best.


Separate from what?


If you are a programmer, you are already using an editor. Who besides programmers needs to see hex dump of a file?


Sysadmins? Programmers who don't have their editor open and just need to quickly look at the contents of a file?


There aren't sysadmins today who aren't programmers.

> Programmers who don't have their editor open

Well... these aren't programmers. Maybe "aspiring programmers", as in someone who still have to learn how to use their computer. It's not anyone who would be considered a competent user.


What are you talking about? I use cat and less all the time when I'm working in the console. No need to switch tools when I want to take a quick look at the file.


Related:

Hexyl: A command-line hex viewer - https://news.ycombinator.com/item?id=18865264 - Jan 2019 (113 comments)


I really like this tool for quick, easy debugging on binary files. okteta used to be my go to, but it's rare I need editing. I'm usually just take a quick look at the contents.


I've used the XVI32 hex editor for Windows in the past.

Written in Delphi.

Split pane.

Useful app. Free.


Get it here:

http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.h...

Just looked at it again.

Has a lot of good features and positive comments.

Including, "even used by Microsoft".


fyi, your link is broken


Proper link here: http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.h...

I also use XVI32. It's honestly probably not the best hex editor out there anymore, but I've been using it for well over a decade now and I'm used to its quirks.


Thanks for sharing the right link.

Yes. I've not used it a lot myself, because my work then did not involve a lot of hex editing, just occassional.

Good to know.


occasional


Sorry, mobile phone copy-paste error.


Cool idea with the color!

But for my needs the hex viewer that I've wrote works best. I need those number displays at the bottom and an easy way to go to relative and absolute offsets. See the screenshot at the bottom: https://github.com/panzi/rust-hox



I love hexyl, but the colorized output does get in the way sometimes when I'm copying out bytes


Just out of curiosity (as I haven't needed to use a hex viewer/editor in web dev) - what use-cases are you using this for?


Reverse engineering!


That sounds fun ;)

Was there a first project you did that got you hooked?


Game hacking for sure, and it's still a blast. Playing around with variables in cheat engine or messing with save files is a fantastic way to get started as long as you stick to single player games (multiplayer cheating is way harder and usually unethical).

https://cheatengine.org/


Can you explain more? Copying with a mouse selection, of by piping into something? You shouldn't get ANSI escapes either way.


When I was coming up, we didn’t have no dashed lines dividing the displays into 8-byte segments. And color? Everything was orange text on black. If you wanted green on black, you used a coworker’s terminal.


Althought it is worth remembering that hex editors with colour and lines are now a thing that is at least 35 years old, as Central Point's PC Tools for DOS had one.


That's nice.


Strongly related: is there a hex viewer where I can set , preferably on-the-fly or with a command line switch record delimiters so it colors the records?


how many different record separators do you have?

ImHex may be useful to you if you only have a few. You can write a pattern file which parses the whole file and displays everything, if you want.


My, that is very useful.

I have been out of the bit land for a very, very long time -- the last of my assembly programs are old enough to drink -- but I am increasingly finding myself running hexdump again and it's not a great experience. I expected better in 2023 and here it is.

I will explore but https://github.com/WerWolv/ImHex/blob/c2e023f567d4e838ea69e6... this makes me super-duper hopeful. UTF-8 capable ex viewer, that'd be handy.


$ od -cx file

or

$ other_command_or_pipeline | od -cx

is from early versions of Unix. Prints both character and hexadecimal representation of each byte in the input.

$ man od


This isn't nearly as pretty and usable as hexyl's output IMO, but always awesome to have the more readily available method!


Thanks. Yes, od's ubiquity on Unix was the reason why I mentioned it.

And right, it's output has issues for some uses.


Does it handle files larger than memory? I've been using dhex for that purpose.


When you're only dumping a file memory use shouldn't be an issue. It is the hex editors that run into memory problems.


Ah, sorry, I didn't understand what this is for.


Makes me fondly remember XTree...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: