Thank heavens for this advancement! I imagine those using Doom to manage system processes (https://www.cs.unm.edu/~dlchao/flake/doom/chi/chi.html), will appreciate the security measures implemented in this version. I can only hope these two projects can find common purpose.
Several years back I convinced a coworker to try and install that. He had a mac and I was running FreeBSD or something and couldn't run it. He'd been running some sort of simulation for several days on that machine.
So we fired it up and within about 2-3 minutes he managed to shoot PID 1 or something like that as the machine rebooted. Boss was not happy.
Actually you're right. He killed the simulation process now that I'm wracking my brain to remember how that actually worked. That then led to us joking about what it'd be like to force-reboot via the killing.
The key detail I'll never forget is his "Oh fuck!" face when it happened.
I've just run # kill -L - there seem to me rather more signals available than I remember from the past. I'm pretty sure I have never even heard of the ones >16 let alone SIGRT{MIN|MAX}{|x}. I've only ever used SIGHUP or SIGKILL in general and SIGUSR1 for dd progress (which now has --progress at last)
"The only signals that can be sent to process ID 1, the init process,
are those for which init has explicitly installed signal handlers.
This is done to assure the system is not brought down accidentally."
And SIGKILL can not be captured (enforced by the kernel):
"The signals SIGKILL and SIGSTOP cannot be caught or ignored."
"This is thought to be entirely secure against the Meltdown and Spectre CPU vulnerabilities, which require speculative execution on branch instructions."
I must not be understanding Meltdown correctly and I know this is a joke, but I thought it was the malicious code that used speculative execution to (indirectly) read unauthorized memory. So in what sense is Branchless Doom secure against Meltdown?
You don't store your plaintext passwords in a modded version of Doom with a kernel scheduler hack to ensure it mostly gets dispatched on the HyperThread core neighboring the one your browser is getting dispatched onto?
My understanding of how some of this works is that to handle jmps, the program uses nojmp to become its own signal handler and inserts a faulty mov to raise a SIGSEGV, mov addresses around, and continue as though it was a successful jmp.
And it looks like float and int math is handled with lookup tables. yikes! but impressive!
It's cooler than that, IMO. It uses the SIGSEGV to jump back to the start, once per global iteration. Bit most of the normal 'jumps' are encoded by doing both sides of the work, but using a scratch base address for the storage of the 'not taken' side that ends up not contributing to the actual result.
Structuring code like this actually has it's purpose. Instruction stream executed by massively parallel SIMD systems (prototypically for 80's supercomputers like Connection Machine, but also for GPUs few generations back) actually works in exactly same way.
I know it's a joke, but this still branches, via an obscure mechanism of faulting, and mov instructions past the faulting instruction will speculatively execute.
That's not quite true... the code (essentially) loops over itself infinitely; the only branch is a loop back to the beginning of the program. Since this branch has only one target, it can be a direct jump (as opposed to an indirect one), meaning it would not use the branch trace buffer, and would not cause speculative execution. The author used faulting for branches only to write the program in all mov instructions; if the last 'mov' is replaced with a direct jmp (which, it looks like, can be done with the --no-mov-loop flag to the compiler), no more speculative execution.
Another important thing here is that it's an unconditional direct jump. A conditional direct jump can still cause speculative execution (e.g. could be vulnerable to Spectre variant 1).
That's a pity, I was hoping that this could be converted to a SIMD architecture allowing you to run hundreds of staggeringly slow doom instances simultaneously on a GPU.
One of the main ideas in OoO processors is that the processors are in many ways like dataflow machines, waiting for inputs to become available in order to start (potentially speculative) execution.
So, encoding logic in dependent movs does not inherently address the cache side channel issue.
History based branch prediction one difference with with explicit branches of course.
Uses movfuscator, one of the most painful tools when you're faced with reverse engineering. Also one of the most fun, from the other side of the aisle.
Hopefully this isn't too off topic, but could you explain how to even start approaching this from a RE/malware analysis perspective?
I'm guessing there's no drag-and-drop de-obfuscate tool like there is for some of the common .NET obfuscators.
Do you just rely on behavioral/dynamic methods?
I was just looking to replace a specific function call, because the company weren't using VC, and no longer had a working codebase.
So I was hunting a particular pattern to replace, that I knew probably existed. A simple combination of demovfuscator, regex on the "original" asm and eyeballing it succeeded in the end.
Has it ever been used against reverse engineering in the wild? Its lack of compiler features (it's based on lcc IIRC) and the sheer slowness of any operation makes it quite unsuitable for production usage.
In my case, it's sort of a yes and no. I reverse engineered partly, but I was only looking to replace whole function calls.
Company I came in as a contractor to... Sort of repair codebases and retrain staff, had a product they were getting paid buckets to support.
Unfortunately, someone had broken the codebase quite severely, and they weren't using version control. All we had was a movfuscated release (yay for paranoid managers who go to tech conferences), but needed to fix a certain feature ASAP.
So I used demovfuscator, someone's memory of what the code once looked like, and some ASM know-how to tear out references to the old and inject the new. Took a couple months of going nowhere fast, but I was pretty far out of my depth.
The comment you're referring to implies yes, and it's often applied to very specific pieces of code instead of the entire program.
I have also seen a similar obfuscation, although perhaps not using The Movfuscator, many years ago in some shareware. You could probably guess which parts of the code used it; it even looks very distinctive when you open the binary in a text editor for a quick glance.
Apparently someone started writing a decompiler for movfuscator that looked for the patterns of movs generated. Probably possible to obfuscate past that still though with an extra layer of indirection whereby all instructions generate the same basic pattern of movs, with one small difference.
i love the design of the movfuscator. Most obfuscation tools tries to add more complexity and branching, in the hope that it will fool the decompiler. But it turns out, having absolutely no branching, and no 'features' makes it so much more of a nightmare to reverse!
This is a more interesting idea than I would have thought at first. I wonder if it could be used to discover whether any frame-perfect exploits still remain unfound in Doom. I know that there are people who have played the game frame-by-frame in order to identify possible advantages, this is an interesting take on the idea. The one concern I have is this: does it change the enemy logic significantly? Will the enemies still react in roughly the same way?
I also wonder if it would be at all possible to eventually speed it up (maybe not to actual play speed, but faster) while remaining branchless and free of speculation. It might be interesting to see if research in Doom could provide faster and more secure solutions in the real world.