A few days ago this went mostly ignored (https://news.ycombinator.com/item?id=34...

2OEH8eoCRo0 · on Jan 5, 2023

Incredible. I had this exact idea rolling around in my head. Could something be trained to decompile binaries to source in a readable way? We have vast source code available and can build the binaries.

ta988 · on Jan 5, 2023

Ghidra and ida pro pseudo code are already pretty good.

mdaniel · on Jan 5, 2023

Ghidra may be pretty good for some binaries (or for some audience?), but my experience trying to get it to reverse both golang and rust-lang binaries has been abysmal. It fails to correctly identify string literals (which is my #1 go-to for finding "points of interest") and the decompilation output is ... well, maybe it's helpful to someone but not to me. I regret that I let my Binary Ninja license lapse in order to see what it would have to say about the same binaries, and I've never had an IDA license to know what that's like

As a point of comparison, I fed 10.2.2 a copy of gojq 0.12.11 that I had lying around and this is pretty representative of its output

  if (DAT_0075c6c8 == (code *)0x0) {
    ppuStack_38 = (undefined **)0x45ecd7;
    FUN_00462ee0(&DAT_0075da28,local_10,iVar3,iVar4);
    *(undefined8 *)(in_FS_OFFSET + -8) = 0x123;
    if (DAT_0075da28 != 0x123) {
      ppuStack_38 = (undefined **)0x45ecf8;
      FUN_00460dc0();
    }
  }

for further comparison, I fed it actual jq and it did much better about the string literals

    if ((((((iVar6 == 0) || (DAT_00108018 = DAT_00108018 | 1, local_58 != 0)) &&
          ((iVar6 = FUN_001045a0(pFVar20,0x72,"raw-output",pcVar13), iVar6 == 0 ||
           (DAT_00108018 = DAT_00108018 | 8, local_58 != 0)))) &&
         ((iVar6 = FUN_001045a0(pFVar20,99,"compact-output",pcVar13), iVar6 == 0 ||
          (local_5c = local_5c & 0xfffff8be, local_58 != 0)))) &&
        ((iVar6 = FUN_001045a0(pFVar20,0x43,"color-output",pcVar13), iVar6 == 0 ||
         (DAT_00108018 = DAT_00108018 | 0x40, local_58 != 0)))) &&
       (((iVar6 = FUN_001045a0(pFVar20,0x4d,"monochrome-output",pcVar13), iVar6 == 0 ||
         (DAT_00108018 = DAT_00108018 | 0x80, local_58 != 0)) &&
        ((iVar6 = FUN_001045a0(pFVar20,0x61,"ascii-output",pcVar13), iVar6 == 0 ||
         (DAT_00108018 = DAT_00108018 | 0x20, local_58 != 0)))))) {
      iVar6 = FUN_001045a0(pFVar20,0,"unbuffered",pcVar13);

unnouinceput · on Jan 5, 2023

IDA has a free variant. Did you tried?

mdaniel · on Jan 5, 2023

I somehow thought that IDA Free was missing the decompiler, but I just downloaded 8.2.221216 macOS x86_64 and while it did a much better job at identifying the symbols in the rust binary, regrettably it then consumed 100% of the CPU and effectively locked up. So ... better, I guess? :-/

unnouinceput · on Jan 5, 2023

Myeah, that sound like a bug in this variant. File a bug report with them, probably they'll release a better free one then you can properly test your theory. Good luck

jki275 · on Jan 5, 2023

They are "pretty good" in the sense that they define the sequence of assembly instructions according to some C type code that may have produced them. This is a tool that gives you essentially, "this function might be doing MD5".

I haven't tried it yet, but I can see it being useful if it's somewhat accurate (that's a big if), and quite different from what Ghidra gives you in pseudo code.

brnt · on Jan 5, 2023

> to write plain English explanations of what a piece of code does.

I could use this for a regular project for which I have the source.

mdaniel · on Jan 4, 2023

Wow, I wouldn't have expected Tenable to shell out to curl, especially when the curl only adds two headers and they omitted the "--fail" that would cause non-200 responses to return a non-zero exit code :-(

https://github.com/tenable/ghidra_tools/blob/main/g3po/g3po....

obliviasimplex · on Jan 6, 2023

Fair point. It was a quick and dirty workaround, in the absence of `requests` in Ghidra's Jython distribution, but it turns out that `httplib` is available, and the latest commit uses that to do the HTTP request instead.

m3affan · on Jan 4, 2023

This is big. I wonder how many subprojects will spin off from it.

denzil · on Jan 5, 2023

Very nice. And it makes me wonder what would be the result if the GPT was asked to point out security problems in the code.

obliviasimplex · on Jan 6, 2023

You can actually try adding "and indicate what security vulnerabilities are present in the code, if any" or something to that effect to the prompt, by tweaking the `EXTRA` global variable defined near the head of the script. My experience with this so far is that it tends to spew out infosec truisms that aren't closely connected with the code, and that most interesting vulnerabilities require a bit more contextual awareness to notice than this tool has available to it, but ymmv, and it's definitely worth taking a bit of time to see if you can massage the prompt to finagle useful bughunting output from the tool.

obliviasimplex · on Jan 6, 2023

Thanks for boosting this!