Hacker News new | past | comments | ask | show | jobs | submit login
McSema: A native code to LLVM IR translation framework (trailofbits.com)
85 points by wyc on Aug 7, 2014 | hide | past | favorite | 12 comments



Coupled with emscripten, which compiles LLVM IR to asm.js, this could be huge. Questions of IP and legality aside for the moment, imagine being able to port everything from old games to VST audio plugins to the browser... without needing access to the source code!

Others have already had similar ideas: see https://lobste.rs/s/m39toj/a_preview_of_mcsema_a_framework_f...

Of course nothing other than trivial examples will work at this early stage... but we can only dream!


Other LLVM decompilation projects:

Dagger, http://dagger.repzret.org (x86)

Fracture, https://github.com/draperlaboratory/fracture (x86, ARM, PPC)

libbeauty, https://github.com/jcdutton/libbeauty (x86)


I've actually used dagger to transform a very simple library into IR. I had trouble trying/was unable to do the same with the other two projects. Unfortunately when I contacted the dagger authors for some tips on how to fix some deficiencies that I found with a more complex binary, I received no response. =(

Granted, they are students and thus always busy

I can't wait to try out McSema. =)


I have been waiting for something like this to be solved by those more capable than myself. My dream is to combine this with ZeroVM for cloud execution and LLVM Polly for automatic parallelism (and other static analysis), and to have a runtime that seemingly magically takes a simple scientific app and runs it on a huge virtual machine in the cloud.


The talk at REcon that introduced this tool is also excellent: http://recon.cx/2014/video/recon2014-10-artem-dinaburg-andre...


Pardon my ignorance, (I don't mean to be dense here) but what the heck is this doing? It sounds like recompiling machine code into intermediate code. Is that a fair (if short and over simple) description?


The announcement links to an earlier blog post (pardon my formatting):

http://blog.trailofbits.com/2014/06/23/a-preview-of-mcsema/

"McSema translates x86 machine code into LLVM bitcode.

"Why would we do such a crazy thing?

"Because we wanted to analyze existing binary applications, and reasoning about LLVM bitcode is much easier than reasoning about x86 instructions.

"Not only is it easier to reason about LLVM bitcode, but it is easier to manipulate and re-target bitcode to a different architecture."


Thank you. I missed that post.


Yes, it’s a Disassembler: https://en.wikipedia.org/wiki/Disassembler


It's more than a disassembler. Disassemblers are easy to write. Stateful transformation is much harder.


Would this allow one to recompile X86 binaries to another architecture?


What's the performance overhead though? And size overhead? I'd imagine it'd be quite large.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: