It is not easy to follow. And there are many things worth discussing. I understand there are complications with MIPS and C++ just to name a few.
But let me stick with some basics. So, I can write and compile an x86 test.c program. Then, I use your extension and undo the linking. Then, I use the results to link again into a new executable? Are the executables identical? When does it break?
How much of a task is it to make it a standalone program? What about x64 support?
> But let me stick with some basics. So, I can write and compile an x86 test.c program. Then, I use your extension and undo the linking. Then, I use the results to link again into a new executable?
There are links in the README of my Ghidra extension repository that explain these use-cases in-depth on my blog, but as a summary:
- You can delink the program as a whole and relink it. This can port a program from one file format to another (a.out -> ELF) and change its base address.
- You can delink parts of a program and relink them into a program. This can accomplish a number of things, like transforming a statically-linked program into a dynamically-linked one, swapping the statically linked C standard library for another one, making a port of the program to a foreign system, creating binary patches by swapping out functions or data with new implementations...
- You can delink parts of a program and turn them into a library. For example, I've ripped out the archive code from a PlayStation game built by a COFF toolchain, turned it into a Linux MIPS ELF object file and made an asset extractor that leverages it, without actually figuring out the archive file format or even how this archive code works.
You can probably do even crazier stuff than these examples. This basically turns programs into Lego blocks. As long as you can mend them together, you can do pretty much anything you want. You can also probably work on object files and dynamically-linked libraries too, but I haven't tried it myself.
> Are the executables identical?
Probably not byte-identical, but you can make executables that have the same observable behavior if you don't swap out anything in a manner that impacts it. The interesting stuff happens when you start mixing things up.
> When does it break?
Whenever the object file produced is incorrect or when you don't properly mend together incompatible ABIs. The first case happens mostly when the resynthesized relocations are missing or incorrect, corrupting section bytes in various ways. The second case can happen if you start moving object files across operating systems, file formats, toolchains or platforms.
> How much of a task is it to make it a standalone program?
My analyzers rely on a Ghidra database for symbols, data types, references and disassembly. You can probably port/rewrite that to run on top of another reverse-engineering framework. I don't think turning it into a standalone program would be practical because you'll need to provide either an equivalent database or the analyzers to build it, alongside the UI to fix errors.
> What about x64 support?
Should be fairly straightforward since I already have 32-bit x86 support, so the bulk of the logic is already there.
I encourage you to read my blog if you want to get an idea how this delinking stuff works in practice. You can also send an email to me if you want, Hacker News isn't really set up for long, in-depth technical discussions.
This sounds like it would benefit from modifications to linkers to make decomposition easier. The benefits of code reuse might make it worthwhile, although the security implications of code reuse without having any idea of what's in the code seem formidable.
Some linkers can be instructed to leave the relocation sections in the output (-q/--emit-relocs for gold/mold), but it's extremely unlikely that an artifact you would care about was built with this obscure option.
I'm mostly using this delinking technique on PlayStation video games, Linux programs from the 90s and my own test programs, so I'm not that worried about security implications in my case. If you're stuffing bits and pieces taken from artifacts with questionable origins into programs and then execute them without due diligence, that's another story.
But let me stick with some basics. So, I can write and compile an x86 test.c program. Then, I use your extension and undo the linking. Then, I use the results to link again into a new executable? Are the executables identical? When does it break?
How much of a task is it to make it a standalone program? What about x64 support?