Hacker News new | past | comments | ask | show | jobs | submit login
Movfuscator: Compile C into only mov instructions (github.com/battelle)
145 points by todsacerdoti on May 18, 2021 | hide | past | favorite | 40 comments




From the top link:

https://news.ycombinator.com/item?id=18992556

>The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.

https://github.com/xoreaxeaxeax/movfuscator/tree/master/vali...

I have to say, reading through the github for the movfuscator was pretty damn amusing. I honestly burst out laughing at the control flow graphs. But I'm curious, is this actually practical for anything?

Running Doom at 1 Frame per 7 hours is, pretty unreasonable. Would text based software even be usable with this?

Either way, I do enjoy cleverly, overly engineered, possibly useless things created just because someone thought it would be funny.


> Would text based software even be usable with this?

I just wrote a quick C program to find the primes less than 100000 and compiled it with gcc and movcc. I wouldn't say it's, you know, fast, but I'm impressed at how it does finish in an amount of time that I was willing to wait.

  $ time ./primes.gcc  | wc -l
  9592
  
  real 0m0.035s
  user 0m0.031s
  sys 0m0.012s
  $ time ./primes.movcc  | wc -l
  9592
  
  real 0m10.511s
  user 0m8.289s
  sys 0m2.228s
It looks like the reason that it spent so much time in syscalls is that it somehow uses signals for control flow. (I'm not quite sure how that works.)


movccc uses a faulting mov instruction to complete the execution loop. without this a jmp is required for infinite execution.

see https://github.com/xoreaxeaxeax/movfuscator/blob/master/movf... and https://github.com/xoreaxeaxeax/movfuscator/blob/master/movf...


Could that tbeoretically be replaced with a conditional move to recoup?


I laughed at the flow control graph as well, properly hilarious.

I liken the motivations behind things like Movfuscator to mountain climbing: they do it because it's there to be done.


yes, both are related to art – just without the need for an audience. It's play.


Could maybe be useful for malware obfuscation?


Or for license check / DRM (I appreciate some might consider one or both of these to be a type of malware).

As someone else noted, a large block of mov instructions would be quite easy to spot so you'd have to tie it into a bit of core application / algorithm logic. But that doesn't mean the whole program needs to be written that way.


All software which does something that the user (owner of the device) would rather it didn’t do, is malware.


That's yoo generous of a definition. I would prefer if the software I used didn't crash, or have any bugs, but mistakes get made sometimes. By your definition, that's malware. I would say malware is software who's primary focus is to do things the user wouldn't like. A game checking DRM licenses isn't malware as it's not the primary purpose, just an annoying aspect.


You’re right, my definition was too broad; intention should matter. But DRM would still count; it intentionally does something which the user or device owner would rather it did not do.


So every software with a license check is malware? That doesn't necessarily mean it verifies with an external server, it might just locally check a code and use some bits contained therein to print "Registered to XXXX" on the about screen.

What about a one-time screen kindly asking for a donation?


> So every software with a license check is malware?

Yes? Obviously?

> What about a one-time screen kindly asking for a donation?

If it doesn’t have a “don’t show this again” checkbox, then I would have to say “yes”.


As long as you don't need your malware to actually accomplish anything.

Also, a chunk of code that is just a long string of MOV instructions is going to be really easy to spot for an antivirus program.


What do you mean? Movcc can call external functions (with a jmp, tbf)


Mostly that it's terribly slow.


That was about the only thing I could think of.


This paper (`mov` is Turing Complete) [1] and this StackOverflow post [2] about why this is possible is probably more interesting to most people than the actual implementation.

[1]: https://drwho.virtadpt.net/files/mov.pdf

[2]: https://stackoverflow.com/questions/61048788/why-is-mov-turi...


The idea a create a CPU with only one instruction is much older. I recall that Professor Van Der Poel of Deft University did it in the 1960s. A decrement-and-jump-if-non-zero instruction.


There also is the demovfuscator (https://github.com/kirschju/demovfuscator) which does the opposite and is capable of recovering the control-flow of the original program before movfuscation.


(Disclaimer: I know exactly zero things about reverse engineering, so this is probably a very stupid question. Read at your own peril.)

Could this be helpful in reverse engineering binaries by first movfuscating and then demovfuscating them? My hypothesis is that movfuscation (maybe coupled with some other techniques) might “normalize” the program in some way and demovfuscation might recover some more human-understandable structures. Or would demovfuscation just bring back the same original obfuscated mess?


Movfuscating works from source code, not compiled binary.


This is impressive, on the order of building a working rapid transit system entirely out of q-tips.



I think the most interesting thing about this is the "other architectures" section, where it shows how this technique is applicable to any architecture with nothing more than "mov reg, [reg+const]" (where const can be 0) and "mov [reg+const], reg" --- making it usable on all the major "big" ISAs like x86, ARM (32 and 64-bit), MIPS/RISC-V, etc. as well as some microcontrollers like the Z80.


In the overview section the author states:

>"... there is no self-modifying code, no transport-triggered calculation, and no other form of non-mov cheating."

Could someone say what is meant by "transport-triggered calculation"? Also how does "self-modifying code" work in the context of such constraints exactly? I had a look through the "mov is Turing-complete" paper referenced in this README but didn't come across these.


I guess something like the Wireworld computer? (https://www.quinapalus.com/wires10.html – another fascinating topic, by the way…)

That one only uses MOVs, too, but the simulated CPU has a number of special registers implementing a number of basic operations, so instead of having to manually implement them the hard way using only MOVs (which in the particular case of the wireworld computer probably wouldn't even be possible due to its constrained resources and architecture), you can "cheat" by simply MOVing your operands into the respective special registers and then reading the result.


Thanks for the Wireworld link. This is indeed fascinating. Cheers.


As in some examples that I think were posted recently on HN, you can make a simple multi-cycle "CPU" based entirely on lookup tables (and possibly a bit of SRAM for storage.)


The IBM 1620 (early 1960s) did integer arithmetic entirely with lookup tables. Clearing memory without reloading the lookup tables resulted in a machine that was bereft of arithmetic.


Wow! Could it conceivably re-derive them "experimentally" with nested loops or something?


You can create any arbitrary logic using LUTs. That’s how FPGAs work.


This sounds interesting. Does anyone have a link to these posts or similar?


The control flow graph section of the README is my favorite part.

Is there some thing like the lambda calculus that corresponds to this? I.e., the Turing completeness of a single instruction?


There's apparently a whole field about this:

https://en.wikipedia.org/wiki/One-instruction_set_computer

(I don't know what's known about how to tell whether or not a particular instruction will be Turing complete. Maybe, as in many other parts of computer science, you can do it by reductions: FOO is Turing complete if you can implement BAR with it, where BAR is already known to be Turing complete; BAZ is not Turing complete if you can implement it with QUX, where QUX is already known not to be Turing complete?)


I would like to see a performance comparison with normal GCC output.


I think there is a doom comparison, is it 1 frame every 6 hours? I can't remember.


Doom is not a good comparison as it presumably uses a lot of floating point.


Doom uses fixed point arithmetic [1]

Quake was the first Id game to use floating point and why other chips like Cyrix which had poor floating point units, suffered as a result.

https://doomwiki.org/wiki/Fixed_point




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: