I used chatGPT to decode proprietary binary files of some industrial machinery. ...

fragmede · 2023-11-05T12:53:42.000000Z

That sounds amazing. Shame it's proprietary, I'd love to read that chat transcript. do you just paste binary data in and ask it to decipher it? or do you ask it leading questions? or...?

dazhbog · 2023-11-05T15:00:25.000000Z

Lots of follow-ups, here is the transcript (warning, too much bla bla). i was feeding the file to gpt4 slowly because i was hitting its input limits:

https://chat.openai.com/share/23db424d-7307-46da-913f-d45cdc...

esjeon · 2023-11-05T17:16:41.000000Z

This is cool, though it did make a mistake while converting hex number to decimal (0x132004 = 1253380, not 1249284). Proof reading this can be a big pain. It can detect those patterns out of a long piece string like nothing, yet it fails at basic conversion, which is really beyond me.

simonw · 2023-11-05T17:40:50.000000Z

Have you tried ChatGPT Code Interpreter aka Advanced Data Analysis mode?

That's the thing that can write and then execute Python code against files you upload to it.

I've had great results using it to decipher weird binary formats, since it can try things out and iterate on them.

dazhbog · 2023-11-05T21:23:46.000000Z

Yes, I tried it for this bin file and it didn't go as deep as stock gpt4. It wrote some python code to parse the file, but it was hard to have a long conversation with it regarding the data. It was always jumping into writing python before the brainstorming finished (could be a feature not a bug) ;)

shepherdjerred · 2023-11-05T16:47:56.000000Z

Oh that is super cool!

tamimio · 2023-11-05T19:16:27.000000Z

Check Ciphey, I have used several times before and overall it’s great. https://github.com/Ciphey/Ciphey

throw_m239339 · 2023-11-05T20:20:10.000000Z

I'm looking to reverse engineer some file format in order to implement and editor for that file format (proprietary file format, undocumented but AFAIK not encrypted), would it be possible to use that program for that purpose? Is there another free tool for that purpose?

tamimio · 2023-11-06T17:29:02.000000Z

That’s a very generic question, hard to tell without extra details, but I find it useful against decoding hashes or at least giving clues oh how to decode it.

dazhbog · 2023-11-05T21:16:13.000000Z

Oh looks cool, I will check it out! Thanks!

lopkeny12ko · 2023-11-05T13:56:23.000000Z

I don't buy this. LLMs are basically just fancy text completion based on training data. "Binary data from a proprietary industrial machine" sounds like the furthest possible thing that could have been in the training data. How can you possibly trust its output if it's not something it's ever seen before?

btbuildem · 2023-11-05T14:09:00.000000Z

You could try this with a hex dump of an executable binary

stevenhuang · 2023-11-05T14:02:10.000000Z

you have the wrong conceptual model of how LLMs do the thing they do

lopkeny12ko · 2023-11-05T14:15:29.000000Z

The only reason I say this is because I have tried. I asked an LLM to decode a variety of base64 strings, and every single time, it said the deocded ASCII was "Hello, world!"

This doesn't come as a surprise to me. Unless it was trained on a dataset that included a mapping of every base64-encoded character, it's just going to pattern-complete on sequences of base64-encoded-like characters and assume it translates to "Hello, world!" from some programming tutorial it was trained on.

M4v3R · 2023-11-05T14:29:20.000000Z

Which model did you use? GPT-4 can encode and decode Base64, at least for short strings. I was pretty surprised when I first saw that. Proof:

https://chat.openai.com/share/9382be94-d59a-4a2a-b03b-43dba3...

https://chat.openai.com/share/421cc39e-ea9c-4ff6-9e45-1aa151...

waveBidder · 2023-11-05T20:14:48.000000Z

3.5 can't, just tried and got this https://chat.openai.com/share/31e7038e-d594-4c6f-8f6e-27e920.... they probably specifically added a bunch of examples.

LinuxBender · 2023-11-05T14:50:03.000000Z

Can it figure this one out without any hints? Not base64. Use case [1]

    ONXW2ZLUNBUW4Z2AONXW2ZLXNBSXEZJOORWGI===

[1] - https://ohblog.net/about/

stavros · 2023-11-05T14:56:50.000000Z

> The string you've provided appears to be encoded in Base32. Decoding this string from Base32, it results in:

> "This is a test. This is only a test."

So, it got the base32 part right, but the decoding wrong. I would have been extremely surprised if it got the decoding right, though.

LinuxBender · 2023-11-05T14:59:53.000000Z

That's still kinda cool. Now I'm curious if it can decode all the figlet fonts too. Size can be controlled with HTML as some are easier to read visually by a human if smaller

[Edit] - This might makes ones eyes bleed but I am curious if it can read this [1]. If installing figlet type showfigfonts to see examples of all the installed fonts. More can be installed [2] in /usr/share/figlet/fonts/

[1] - https://ohblog.net/chatgpt_test/

[2] - https://github.com/xero/figlet-fonts

TheGeminon · 2023-11-05T14:21:14.000000Z

That kind of decoding is a bit different though. For one, the tokenization process makes encodings difficult to handle (unless it’s trained on a lot of pairs).

This would be more akin to asking ChatGPT to help build a black box parser for base64, not asking it to decode it itself.

JCharante · 2023-11-05T19:47:36.000000Z

GPT4 can absolutely decode base64. Early jailbreaks were to base64 a python-based jailbreak to get it to output whatever you wanted and later OpenAI added a patch to filter base64 outputs to follow their rules.

waveBidder · 2023-11-05T20:00:02.000000Z

how are you sure it wasn't bullshitting? were you feeding it a known binary?

dazhbog · 2023-11-05T21:14:46.000000Z

Some of the input data was known yes, because this software has a gui and it outputs a binary file based on user data (PCB Bill of materials)+internal machine settings. So i knew there were some coordinates and ascii data in there and GPT helped find the delimiters, etc. Some things i was also able to figure out with Ghidra and lots of trial and error.