Hacker News new | past | comments | ask | show | jobs | submit login

I used chatGPT to decode proprietary binary files of some industrial machinery. It was amazing how it can decipher shit and find patterns. It first looked for ascii characters, then byte sequences acting as delimiters, then it started looking at which bytes could be the length or what 4-bytes could be floating point numbers of coordinates and which endianness was more logic for coordinates, etc. etc. crazy stuff.



That sounds amazing. Shame it's proprietary, I'd love to read that chat transcript. do you just paste binary data in and ask it to decipher it? or do you ask it leading questions? or...?


Lots of follow-ups, here is the transcript (warning, too much bla bla). i was feeding the file to gpt4 slowly because i was hitting its input limits:

https://chat.openai.com/share/23db424d-7307-46da-913f-d45cdc...


This is cool, though it did make a mistake while converting hex number to decimal (0x132004 = 1253380, not 1249284). Proof reading this can be a big pain. It can detect those patterns out of a long piece string like nothing, yet it fails at basic conversion, which is really beyond me.


Have you tried ChatGPT Code Interpreter aka Advanced Data Analysis mode?

That's the thing that can write and then execute Python code against files you upload to it.

I've had great results using it to decipher weird binary formats, since it can try things out and iterate on them.


Yes, I tried it for this bin file and it didn't go as deep as stock gpt4. It wrote some python code to parse the file, but it was hard to have a long conversation with it regarding the data. It was always jumping into writing python before the brainstorming finished (could be a feature not a bug) ;)


Oh that is super cool!


Check Ciphey, I have used several times before and overall it’s great. https://github.com/Ciphey/Ciphey


I'm looking to reverse engineer some file format in order to implement and editor for that file format (proprietary file format, undocumented but AFAIK not encrypted), would it be possible to use that program for that purpose? Is there another free tool for that purpose?


That’s a very generic question, hard to tell without extra details, but I find it useful against decoding hashes or at least giving clues oh how to decode it.


Oh looks cool, I will check it out! Thanks!


I don't buy this. LLMs are basically just fancy text completion based on training data. "Binary data from a proprietary industrial machine" sounds like the furthest possible thing that could have been in the training data. How can you possibly trust its output if it's not something it's ever seen before?


You could try this with a hex dump of an executable binary


you have the wrong conceptual model of how LLMs do the thing they do


The only reason I say this is because I have tried. I asked an LLM to decode a variety of base64 strings, and every single time, it said the deocded ASCII was "Hello, world!"

This doesn't come as a surprise to me. Unless it was trained on a dataset that included a mapping of every base64-encoded character, it's just going to pattern-complete on sequences of base64-encoded-like characters and assume it translates to "Hello, world!" from some programming tutorial it was trained on.


Which model did you use? GPT-4 can encode and decode Base64, at least for short strings. I was pretty surprised when I first saw that. Proof:

https://chat.openai.com/share/9382be94-d59a-4a2a-b03b-43dba3...

https://chat.openai.com/share/421cc39e-ea9c-4ff6-9e45-1aa151...


3.5 can't, just tried and got this https://chat.openai.com/share/31e7038e-d594-4c6f-8f6e-27e920.... they probably specifically added a bunch of examples.


Can it figure this one out without any hints? Not base64. Use case [1]

    ONXW2ZLUNBUW4Z2AONXW2ZLXNBSXEZJOORWGI===
[1] - https://ohblog.net/about/


> The string you've provided appears to be encoded in Base32. Decoding this string from Base32, it results in:

> "This is a test. This is only a test."

So, it got the base32 part right, but the decoding wrong. I would have been extremely surprised if it got the decoding right, though.


That's still kinda cool. Now I'm curious if it can decode all the figlet fonts too. Size can be controlled with HTML as some are easier to read visually by a human if smaller

[Edit] - This might makes ones eyes bleed but I am curious if it can read this [1]. If installing figlet type showfigfonts to see examples of all the installed fonts. More can be installed [2] in /usr/share/figlet/fonts/

[1] - https://ohblog.net/chatgpt_test/

[2] - https://github.com/xero/figlet-fonts


That kind of decoding is a bit different though. For one, the tokenization process makes encodings difficult to handle (unless it’s trained on a lot of pairs).

This would be more akin to asking ChatGPT to help build a black box parser for base64, not asking it to decode it itself.


GPT4 can absolutely decode base64. Early jailbreaks were to base64 a python-based jailbreak to get it to output whatever you wanted and later OpenAI added a patch to filter base64 outputs to follow their rules.


how are you sure it wasn't bullshitting? were you feeding it a known binary?


Some of the input data was known yes, because this software has a gui and it outputs a binary file based on user data (PCB Bill of materials)+internal machine settings. So i knew there were some coordinates and ascii data in there and GPT helped find the delimiters, etc. Some things i was also able to figure out with Ghidra and lots of trial and error.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: