I have to say, reading through the github for the movfuscator was pretty damn amusing. I honestly burst out laughing at the control flow graphs. But I'm curious, is this actually practical for anything?
Running Doom at 1 Frame per 7 hours is, pretty unreasonable. Would text based software even be usable with this?
Either way, I do enjoy cleverly, overly engineered, possibly useless things created just because someone thought it would be funny.
> Would text based software even be usable with this?
I just wrote a quick C program to find the primes less than 100000 and compiled it with gcc and movcc. I wouldn't say it's, you know, fast, but I'm impressed at how it does finish in an amount of time that I was willing to wait.
$ time ./primes.gcc | wc -l
9592
real 0m0.035s
user 0m0.031s
sys 0m0.012s
$ time ./primes.movcc | wc -l
9592
real 0m10.511s
user 0m8.289s
sys 0m2.228s
It looks like the reason that it spent so much time in syscalls is that it somehow uses signals for control flow. (I'm not quite sure how that works.)
Or for license check / DRM (I appreciate some might consider one or both of these to be a type of malware).
As someone else noted, a large block of mov instructions would be quite easy to spot so you'd have to tie it into a bit of core application / algorithm logic. But that doesn't mean the whole program needs to be written that way.
That's yoo generous of a definition. I would prefer if the software I used didn't crash, or have any bugs, but mistakes get made sometimes. By your definition, that's malware. I would say malware is software who's primary focus is to do things the user wouldn't like. A game checking DRM licenses isn't malware as it's not the primary purpose, just an annoying aspect.
You’re right, my definition was too broad; intention should matter. But DRM would still count; it intentionally does something which the user or device owner would rather it did not do.
So every software with a license check is malware? That doesn't necessarily mean it verifies with an external server, it might just locally check a code and use some bits contained therein to print "Registered to XXXX" on the about screen.
What about a one-time screen kindly asking for a donation?
This paper (`mov` is Turing Complete) [1] and this StackOverflow post [2] about why this is possible is probably more interesting to most people than the actual implementation.
The idea a create a CPU with only one instruction is much older. I recall that Professor Van Der Poel of Deft University did it in the 1960s. A decrement-and-jump-if-non-zero instruction.
There also is the demovfuscator (https://github.com/kirschju/demovfuscator) which does the opposite and is capable of recovering the control-flow of the original program before movfuscation.
(Disclaimer: I know exactly zero things about reverse engineering, so this is probably a very stupid question. Read at your own peril.)
Could this be helpful in reverse engineering binaries by first movfuscating and then demovfuscating them? My hypothesis is that movfuscation (maybe coupled with some other techniques) might “normalize” the program in some way and demovfuscation might recover some more human-understandable structures. Or would demovfuscation just bring back the same original obfuscated mess?
I think the most interesting thing about this is the "other architectures" section, where it shows how this technique is applicable to any architecture with nothing more than "mov reg, [reg+const]" (where const can be 0) and "mov [reg+const], reg" --- making it usable on all the major "big" ISAs like x86, ARM (32 and 64-bit), MIPS/RISC-V, etc. as well as some microcontrollers like the Z80.
>"... there is no self-modifying code, no transport-triggered calculation, and no other form of non-mov cheating."
Could someone say what is meant by "transport-triggered calculation"? Also how does "self-modifying code" work in the context of such constraints exactly? I had a look through the "mov is Turing-complete" paper referenced in this README but didn't come across these.
That one only uses MOVs, too, but the simulated CPU has a number of special registers implementing a number of basic operations, so instead of having to manually implement them the hard way using only MOVs (which in the particular case of the wireworld computer probably wouldn't even be possible due to its constrained resources and architecture), you can "cheat" by simply MOVing your operands into the respective special registers and then reading the result.
As in some examples that I think were posted recently on HN, you can make a simple multi-cycle "CPU" based entirely on lookup tables (and possibly a bit of SRAM for storage.)
The IBM 1620 (early 1960s) did integer arithmetic entirely with lookup tables. Clearing memory without reloading the lookup tables resulted in a machine that was bereft of arithmetic.
(I don't know what's known about how to tell whether or not a particular instruction will be Turing complete. Maybe, as in many other parts of computer science, you can do it by reductions: FOO is Turing complete if you can implement BAR with it, where BAR is already known to be Turing complete; BAZ is not Turing complete if you can implement it with QUX, where QUX is already known not to be Turing complete?)
Previous submissions with significant discussion:
- https://news.ycombinator.com/item?id=18991404
- https://news.ycombinator.com/item?id=16218872
- https://news.ycombinator.com/item?id=12372242
- https://news.ycombinator.com/item?id=10021259
- https://news.ycombinator.com/item?id=9751312