Hacker News new | past | comments | ask | show | jobs | submit login

A few days ago this went mostly ignored (https://news.ycombinator.com/item?id=34161642) and I was asked to re-submit it (https://news.ycombinator.com/item?id=34250150) so that it gets a second chance.

That’s a script for the reverse-engineering tool Ghidra that uses GPT-3 to de-compile machine code and to write plain English explanations of what a piece of code does.

The article is quite detailed and describes both its capabilities and its limitations. That G-3PO script is open source, MIT license: https://github.com/tenable/ghidra_tools/tree/main/g3po

There was also another HN story about what at first sight looks like an alternative implementation of the same idea: “GptHidra – Ghidra plugin that asks OpenAI Chat GPT to explain functions”

https://news.ycombinator.com/item?id=34165291

This one is more recent and lacks that good write-up mentioned above. The script is smaller and it seems to have fewer features.

I suggest checking both of them.




Incredible. I had this exact idea rolling around in my head. Could something be trained to decompile binaries to source in a readable way? We have vast source code available and can build the binaries.


Ghidra and ida pro pseudo code are already pretty good.


Ghidra may be pretty good for some binaries (or for some audience?), but my experience trying to get it to reverse both golang and rust-lang binaries has been abysmal. It fails to correctly identify string literals (which is my #1 go-to for finding "points of interest") and the decompilation output is ... well, maybe it's helpful to someone but not to me. I regret that I let my Binary Ninja license lapse in order to see what it would have to say about the same binaries, and I've never had an IDA license to know what that's like

As a point of comparison, I fed 10.2.2 a copy of gojq 0.12.11 that I had lying around and this is pretty representative of its output

  if (DAT_0075c6c8 == (code *)0x0) {
    ppuStack_38 = (undefined **)0x45ecd7;
    FUN_00462ee0(&DAT_0075da28,local_10,iVar3,iVar4);
    *(undefined8 *)(in_FS_OFFSET + -8) = 0x123;
    if (DAT_0075da28 != 0x123) {
      ppuStack_38 = (undefined **)0x45ecf8;
      FUN_00460dc0();
    }
  }
for further comparison, I fed it actual jq and it did much better about the string literals

    if ((((((iVar6 == 0) || (DAT_00108018 = DAT_00108018 | 1, local_58 != 0)) &&
          ((iVar6 = FUN_001045a0(pFVar20,0x72,"raw-output",pcVar13), iVar6 == 0 ||
           (DAT_00108018 = DAT_00108018 | 8, local_58 != 0)))) &&
         ((iVar6 = FUN_001045a0(pFVar20,99,"compact-output",pcVar13), iVar6 == 0 ||
          (local_5c = local_5c & 0xfffff8be, local_58 != 0)))) &&
        ((iVar6 = FUN_001045a0(pFVar20,0x43,"color-output",pcVar13), iVar6 == 0 ||
         (DAT_00108018 = DAT_00108018 | 0x40, local_58 != 0)))) &&
       (((iVar6 = FUN_001045a0(pFVar20,0x4d,"monochrome-output",pcVar13), iVar6 == 0 ||
         (DAT_00108018 = DAT_00108018 | 0x80, local_58 != 0)) &&
        ((iVar6 = FUN_001045a0(pFVar20,0x61,"ascii-output",pcVar13), iVar6 == 0 ||
         (DAT_00108018 = DAT_00108018 | 0x20, local_58 != 0)))))) {
      iVar6 = FUN_001045a0(pFVar20,0,"unbuffered",pcVar13);


IDA has a free variant. Did you tried?


I somehow thought that IDA Free was missing the decompiler, but I just downloaded 8.2.221216 macOS x86_64 and while it did a much better job at identifying the symbols in the rust binary, regrettably it then consumed 100% of the CPU and effectively locked up. So ... better, I guess? :-/


Myeah, that sound like a bug in this variant. File a bug report with them, probably they'll release a better free one then you can properly test your theory. Good luck


They are "pretty good" in the sense that they define the sequence of assembly instructions according to some C type code that may have produced them. This is a tool that gives you essentially, "this function might be doing MD5".

I haven't tried it yet, but I can see it being useful if it's somewhat accurate (that's a big if), and quite different from what Ghidra gives you in pseudo code.


> to write plain English explanations of what a piece of code does.

I could use this for a regular project for which I have the source.


Wow, I wouldn't have expected Tenable to shell out to curl, especially when the curl only adds two headers and they omitted the "--fail" that would cause non-200 responses to return a non-zero exit code :-(

https://github.com/tenable/ghidra_tools/blob/main/g3po/g3po....


Fair point. It was a quick and dirty workaround, in the absence of `requests` in Ghidra's Jython distribution, but it turns out that `httplib` is available, and the latest commit uses that to do the HTTP request instead.


This is big. I wonder how many subprojects will spin off from it.


Very nice. And it makes me wonder what would be the result if the GPT was asked to point out security problems in the code.


You can actually try adding "and indicate what security vulnerabilities are present in the code, if any" or something to that effect to the prompt, by tweaking the `EXTRA` global variable defined near the head of the script. My experience with this so far is that it tends to spew out infosec truisms that aren't closely connected with the code, and that most interesting vulnerabilities require a bit more contextual awareness to notice than this tool has available to it, but ymmv, and it's definitely worth taking a bit of time to see if you can massage the prompt to finagle useful bughunting output from the tool.


Thanks for boosting this!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: