Branchless Doom

peatmoss · on Jan 24, 2018

Thank heavens for this advancement! I imagine those using Doom to manage system processes (https://www.cs.unm.edu/~dlchao/flake/doom/chi/chi.html), will appreciate the security measures implemented in this version. I can only hope these two projects can find common purpose.

jghn · on Jan 24, 2018

Several years back I convinced a coworker to try and install that. He had a mac and I was running FreeBSD or something and couldn't run it. He'd been running some sort of simulation for several days on that machine.

So we fired it up and within about 2-3 minutes he managed to shoot PID 1 or something like that as the machine rebooted. Boss was not happy.

int0x80 · on Jan 24, 2018

mm usually PID 1 is unkillable even by root AFAIK.

jghn · on Jan 24, 2018

Actually you're right. He killed the simulation process now that I'm wracking my brain to remember how that actually worked. That then led to us joking about what it'd be like to force-reboot via the killing.

The key detail I'll never forget is his "Oh fuck!" face when it happened.

abofh · on Jan 24, 2018

https://www.freedesktop.org/software/systemd/man/systemd.htm...

Not all signals are blocked by init, if init wants to be signaled :)

gerdesj · on Jan 24, 2018

Don't know why you got DVd. This does nothing on a VM running Kali Linux (systemd):

# kill -9 1

abofh · on Jan 24, 2018

https://www.freedesktop.org/software/systemd/man/systemd.htm... - See signals

It's init dependent (though I think SIGKILL is blocked at the kernel – I wouldn't swear to it) - sigrtmin+3 will likely kill your init process :)

gerdesj · on Jan 25, 2018

Cor, the locals are a bit harsh here.

I've just run # kill -L - there seem to me rather more signals available than I remember from the past. I'm pretty sure I have never even heard of the ones >16 let alone SIGRT{MIN|MAX}{|x}. I've only ever used SIGHUP or SIGKILL in general and SIGUSR1 for dd progress (which now has --progress at last)

Time to dig out the manual 8)

int0x80 · on Jan 25, 2018

In Linux, from the manpage:

"The only signals that can be sent to process ID 1, the init process, are those for which init has explicitly installed signal handlers. This is done to assure the system is not brought down accidentally."

And SIGKILL can not be captured (enforced by the kernel):

"The signals SIGKILL and SIGSTOP cannot be caught or ignored."

rootw0rm · on Jan 24, 2018

wtf, hahaha

speleo_engr · on Jan 24, 2018

Haha that made me LOL

sjsbsh · on Jan 24, 2018

They’ll certainly have enough time to decide which proc to kill!

51Cards · on Jan 24, 2018

"The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience."

Favourite line.

FLUX-YOU · on Jan 24, 2018

You could play Doom over email with that kind of time

Yhippa · on Jan 24, 2018

Did a back-of-the-envelope calculation. Assuming you responded promptly it would take around 504 years to finish the game.

kozak · on Jan 24, 2018

I'd say even over paper mail.

Legion · on Jan 24, 2018

I look forward to this being a benchmark for testing CPU advancement in the years to come.

"It gets 2 frames per hour in Branchless DOOM!"

pkilgore · on Jan 24, 2018

Compiler FAQ, in it's entirety, says it all:

>Q: Why did you make this? A: I thought it would be funny.

romwell · on Jan 24, 2018

To the author's credit, it really is. I'm still giggling.

iokevins · on Jan 23, 2018

"This is thought to be entirely secure against the Meltdown and Spectre CPU vulnerabilities, which require speculative execution on branch instructions."

tgb · on Jan 24, 2018

I must not be understanding Meltdown correctly and I know this is a joke, but I thought it was the malicious code that used speculative execution to (indirectly) read unauthorized memory. So in what sense is Branchless Doom secure against Meltdown?

trentnelson · on Jan 24, 2018

You don't store your plaintext passwords in a modded version of Doom with a kernel scheduler hack to ensure it mostly gets dispatched on the HyperThread core neighboring the one your browser is getting dispatched onto?

Amateur.

UncleMeat · on Jan 24, 2018

Meltdown doesn't involve speculative execution across branches. It is out of order execution through instructions that will fail.

banachtarski · on Jan 24, 2018

My understanding of how some of this works is that to handle jmps, the program uses nojmp to become its own signal handler and inserts a faulty mov to raise a SIGSEGV, mov addresses around, and continue as though it was a successful jmp.

And it looks like float and int math is handled with lookup tables. yikes! but impressive!

monocasa · on Jan 24, 2018

It's cooler than that, IMO. It uses the SIGSEGV to jump back to the start, once per global iteration. Bit most of the normal 'jumps' are encoded by doing both sides of the work, but using a scratch base address for the storage of the 'not taken' side that ends up not contributing to the actual result.

euyyn · on Jan 24, 2018

So it prevents speculative execution of branches, by speculatively executing them itself? lol

dfox · on Jan 24, 2018

Structuring code like this actually has it's purpose. Instruction stream executed by massively parallel SIMD systems (prototypically for 80's supercomputers like Connection Machine, but also for GPUs few generations back) actually works in exactly same way.

banachtarski · on Jan 24, 2018

It went full circle! The irony is that CPUs already do a ton of speculative branching

pslam · on Jan 24, 2018

I know it's a joke, but this still branches, via an obscure mechanism of faulting, and mov instructions past the faulting instruction will speculatively execute.

describrion · on Jan 24, 2018

That's not quite true... the code (essentially) loops over itself infinitely; the only branch is a loop back to the beginning of the program. Since this branch has only one target, it can be a direct jump (as opposed to an indirect one), meaning it would not use the branch trace buffer, and would not cause speculative execution. The author used faulting for branches only to write the program in all mov instructions; if the last 'mov' is replaced with a direct jmp (which, it looks like, can be done with the --no-mov-loop flag to the compiler), no more speculative execution.

Liquid_Fire · on Jan 24, 2018

> direct jump

Another important thing here is that it's an unconditional direct jump. A conditional direct jump can still cause speculative execution (e.g. could be vulnerable to Spectre variant 1).

yarg · on Jan 24, 2018

That's a pity, I was hoping that this could be converted to a SIMD architecture allowing you to run hundreds of staggeringly slow doom instances simultaneously on a GPU.

animal531 · on Jan 26, 2018

It would be even better if each of the e.g. 1k instances also ran full networking, so you could have 1024 player splitscreen on the GPU.

Liquid_Fire · on Jan 24, 2018

But the faulting mov instruction is the last instruction in the program. There are no instructions after it to speculatively execute.

fulafel · on Jan 24, 2018

One of the main ideas in OoO processors is that the processors are in many ways like dataflow machines, waiting for inputs to become available in order to start (potentially speculative) execution.

So, encoding logic in dependent movs does not inherently address the cache side channel issue.

History based branch prediction one difference with with explicit branches of course.

shakna · on Jan 24, 2018

Uses movfuscator, one of the most painful tools when you're faced with reverse engineering. Also one of the most fun, from the other side of the aisle.

elago · on Jan 24, 2018

Hopefully this isn't too off topic, but could you explain how to even start approaching this from a RE/malware analysis perspective? I'm guessing there's no drag-and-drop de-obfuscate tool like there is for some of the common .NET obfuscators. Do you just rely on behavioral/dynamic methods?

problems · on Jan 24, 2018

For even the nastiest of obfuscators there are often attempts at deobfuscators...

https://github.com/kirschju/demovfuscator

shakna · on Jan 24, 2018

I was just looking to replace a specific function call, because the company weren't using VC, and no longer had a working codebase.

So I was hunting a particular pattern to replace, that I knew probably existed. A simple combination of demovfuscator, regex on the "original" asm and eyeballing it succeeded in the end.

CapacitorSet · on Jan 24, 2018

Has it ever been used against reverse engineering in the wild? Its lack of compiler features (it's based on lcc IIRC) and the sheer slowness of any operation makes it quite unsuitable for production usage.

shakna · on Jan 24, 2018

In my case, it's sort of a yes and no. I reverse engineered partly, but I was only looking to replace whole function calls.

Company I came in as a contractor to... Sort of repair codebases and retrain staff, had a product they were getting paid buckets to support.

Unfortunately, someone had broken the codebase quite severely, and they weren't using version control. All we had was a movfuscated release (yay for paranoid managers who go to tech conferences), but needed to fix a certain feature ASAP.

So I used demovfuscator, someone's memory of what the code once looked like, and some ASM know-how to tear out references to the old and inject the new. Took a couple months of going nowhere fast, but I was pretty far out of my depth.

They don't use movfuscator anymore.

stctgion · on Jan 24, 2018

Every time I see some one say no version control I do a double take. Hopefully you managed to convince them to start it?

shakna · on Jan 24, 2018

... No... The best I could convince was a regular rsync backup at end of day... You win some, you lose some.

cubano · on Jan 24, 2018

To be honest, rsync sounds like a big win in that group..sheesh.

Good work all around.

userbinator · on Jan 24, 2018

The comment you're referring to implies yes, and it's often applied to very specific pieces of code instead of the entire program.

I have also seen a similar obfuscation, although perhaps not using The Movfuscator, many years ago in some shareware. You could probably guess which parts of the code used it; it even looks very distinctive when you open the binary in a text editor for a quick glance.

earenndil · on Jan 24, 2018

Apparently someone started writing a decompiler for movfuscator that looked for the patterns of movs generated. Probably possible to obfuscate past that still though with an extra layer of indirection whereby all instructions generate the same basic pattern of movs, with one small difference.

chii · on Jan 24, 2018

i love the design of the movfuscator. Most obfuscation tools tries to add more complexity and branching, in the hope that it will fool the decompiler. But it turns out, having absolutely no branching, and no 'features' makes it so much more of a nightmare to reverse!

shakna · on Jan 24, 2018

I think you can see a similar trend in estoric languages.

Malboge is so complicated that nobody can really write a working program.

Brainfuck is so simple that nearly anybody can write it, but reverse-engineering it is a massive time-sink.

rootw0rm · on Jan 24, 2018

And Tigress is even more painful...

wasx · on Jan 24, 2018

Finally I can play my favourite game without worrying about how Intel has fucked me!

craftyguy · on Jan 24, 2018

You could run it on a mid-90's CPU without speculative branching and very likely see better performance.

NinjaKitten · on Jan 24, 2018

486 or Pentium 90, what this game was made for.

craftyguy · on Jan 24, 2018

Yea, either one of those would run the game better than this project/slideshow.

majewsky · on Jan 24, 2018

Or, alternatively, an x86 emulator on a more trustworthy architecture such as RISC-V, or an x86 implementation on an FPGA.

Endy · on Jan 24, 2018

This is a more interesting idea than I would have thought at first. I wonder if it could be used to discover whether any frame-perfect exploits still remain unfound in Doom. I know that there are people who have played the game frame-by-frame in order to identify possible advantages, this is an interesting take on the idea. The one concern I have is this: does it change the enemy logic significantly? Will the enemies still react in roughly the same way?

I also wonder if it would be at all possible to eventually speed it up (maybe not to actual play speed, but faster) while remaining branchless and free of speculation. It might be interesting to see if research in Doom could provide faster and more secure solutions in the real world.

js2 · on Jan 24, 2018

HN discussion of movfuscator: https://news.ycombinator.com/item?id=9751312

HN discussion of trapcc: https://news.ycombinator.com/item?id=5261598

phillco · on Jan 24, 2018

I was disappointed to find out the screenshot is not a GIF.

Gigablah · on Jan 24, 2018

You may have to wait 7 hours to confirm that.

Endy · on Jan 24, 2018

Or right-click and "save image as..." That works too.

ninju · on Jan 24, 2018

The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.

hehe

IncRnd · on Jan 24, 2018

Well, the compiler works!

darawk · on Jan 24, 2018

This is excellent work.

tzahola · on Jan 23, 2018

“The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.”

lol

vvanders · on Jan 24, 2018

That reminds me of playing C&C: Red Alert on a 486/66, I though it was a deliberate strat game.

That was until I installed it on my dad's Pentium 200, the speed increased 3x and I realized it was a completely different beast.

dawnerd · on Jan 24, 2018

Would be interesting to make this a sort of twitch plays. 7 hour voting windows.

catmanjan · on Jan 24, 2018

I think it only registers input every 5 or so frames anyway :O

hnzix · on Jan 24, 2018

This needs blockchain GPU mining!

cubano · on Jan 24, 2018

Please don't give anyone any ideas.

jftuga · on Jan 24, 2018

Another great use case for quantum computing.

mhd · on Jan 24, 2018

As a veteran of early Origin games, I'm up for that. I ran Ultima VII on a DoubleSpaced 386.

felipellrocha · on Jan 24, 2018

Man, I lol'd when I read that line. That was amazing!

geluso · on Jan 24, 2018

Ridiculous. Why does it become so slow though?

fil_d · on Jan 24, 2018

[flagged]

MaxBarraclough · on Jan 24, 2018

It's neat, and rather funny.

To see an unmodified real-world C program run on a real-world hardware architecture, completely branchless, is pretty damn neat.

The way it takes hours to render a frame of DOOM is hilarious.

I wonder if the compiler will be optimised in future.

earenndil · on Jan 25, 2018

> unmodified

You have to patch the source though...

MaxBarraclough · on Jan 25, 2018

Oops, you're right! I thought it was only the build system that had to be adjusted.

About half of the code-changes are just to avoid alloca.

brokenmachine · on Jan 24, 2018

Some ask, "why". Others ask, "why not"...

jval43 · on Jan 24, 2018

I always liked that quote. In full:

I hear you say "Why?" Always "Why?" You see things; and you say "Why?" But I dream things that never were; and I say "Why not?"

(https://en.wikiquote.org/wiki/George_Bernard_Shaw)

bitwize · on Jan 24, 2018

"Science isn't about 'why', it's about 'why not'! Why don't you marry safe science if you love it so much!" --Cave Johnson

calebtroyer · on Jan 24, 2018

It says why right in the compiler FAQ — "I thought it would be funny"

dang · on Jan 24, 2018

Just being interesting is enough.