That’s a script for the reverse-engineering tool Ghidra that uses GPT-3 to de-compile machine code and to write plain English explanations of what a piece of code does.
There was also another HN story about what at first sight looks like an alternative implementation of the same idea: “GptHidra – Ghidra plugin that asks OpenAI Chat GPT to explain functions”
Incredible. I had this exact idea rolling around in my head. Could something be trained to decompile binaries to source in a readable way? We have vast source code available and can build the binaries.
Ghidra may be pretty good for some binaries (or for some audience?), but my experience trying to get it to reverse both golang and rust-lang binaries has been abysmal. It fails to correctly identify string literals (which is my #1 go-to for finding "points of interest") and the decompilation output is ... well, maybe it's helpful to someone but not to me. I regret that I let my Binary Ninja license lapse in order to see what it would have to say about the same binaries, and I've never had an IDA license to know what that's like
As a point of comparison, I fed 10.2.2 a copy of gojq 0.12.11 that I had lying around and this is pretty representative of its output
I somehow thought that IDA Free was missing the decompiler, but I just downloaded 8.2.221216 macOS x86_64 and while it did a much better job at identifying the symbols in the rust binary, regrettably it then consumed 100% of the CPU and effectively locked up. So ... better, I guess? :-/
Myeah, that sound like a bug in this variant. File a bug report with them, probably they'll release a better free one then you can properly test your theory. Good luck
They are "pretty good" in the sense that they define the sequence of assembly instructions according to some C type code that may have produced them. This is a tool that gives you essentially, "this function might be doing MD5".
I haven't tried it yet, but I can see it being useful if it's somewhat accurate (that's a big if), and quite different from what Ghidra gives you in pseudo code.
Wow, I wouldn't have expected Tenable to shell out to curl, especially when the curl only adds two headers and they omitted the "--fail" that would cause non-200 responses to return a non-zero exit code :-(
Fair point. It was a quick and dirty workaround, in the absence of `requests` in Ghidra's Jython distribution, but it turns out that `httplib` is available, and the latest commit uses that to do the HTTP request instead.
You can actually try adding "and indicate what security vulnerabilities are present in the code, if any" or something to that effect to the prompt, by tweaking the `EXTRA` global variable defined near the head of the script. My experience with this so far is that it tends to spew out infosec truisms that aren't closely connected with the code, and that most interesting vulnerabilities require a bit more contextual awareness to notice than this tool has available to it, but ymmv, and it's definitely worth taking a bit of time to see if you can massage the prompt to finagle useful bughunting output from the tool.
That’s a script for the reverse-engineering tool Ghidra that uses GPT-3 to de-compile machine code and to write plain English explanations of what a piece of code does.
The article is quite detailed and describes both its capabilities and its limitations. That G-3PO script is open source, MIT license: https://github.com/tenable/ghidra_tools/tree/main/g3po
There was also another HN story about what at first sight looks like an alternative implementation of the same idea: “GptHidra – Ghidra plugin that asks OpenAI Chat GPT to explain functions”
https://news.ycombinator.com/item?id=34165291
This one is more recent and lacks that good write-up mentioned above. The script is smaller and it seems to have fewer features.
I suggest checking both of them.