Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Illegal and undocumented instructions found in every major vendor CPU (koehntopp.info)
130 points by uptown on July 28, 2017 | hide | past | favorite | 26 comments



I would recommend marking as a dupe of https://news.ycombinator.com/item?id=14872418 (feels a bit harsh) or pointing to http://www.pagetable.com/?p=39


Can someone explain why these instructions are "illegal"?


I think it's an auto-antonym, i.e. divide opcodes into three sets:

  1.  Documented
  2.  Undocumented and throws #UD
  3.  Undocumented but does not throw #UD
Some people will use "Illegal instruction" to refer exclusively to (2), some as exclusively to (3), and some as both (2) and (3).

Edit: Interesting to see https://github.com/xoreaxeaxeax/sandsifter/tree/dff63246#fla... for how the author of Sandsifter describes the --ill option:

> --ill - the inverse of --unk, search for invalid disassemblies (instructions that do not successfully execute but that the disassembler acknowledges)

If we assume "disassembler acknowledges" is approximately equivalent to "documented", then --ill is actually looking for a subset of (1): "Documented but throws #UD"...which also seems fairly natural, at least in this context.


As far as I can tell, "illegal" and "undocumented" instructions are the same thing[0]. Likely they are "illegal" according to the spec of the processor, and that's where the term came from.

[0] At least according to Wikipedia: https://en.wikipedia.org/wiki/Illegal_opcode


The paper only uses that phrase once, as a synonym for a non-existing instruction. It seems that refers to an instruction that the CPU will not execute, and which throws a #UD exception. Despite being illegal/nonexistent, these instructions still have a length, which is how many bytes the CPU decodes before it throws that exception. They can figure this out by placing an illegal instruction sequence at the end of a page, where the next page is marked as non-executable, and sliding it around. If the CPU tries to decode bytes on the next page, it throws #GP. Once you slide the instruction far enough to get a #UD instead, you know that the illegal instruction is that many bytes.

This is different from undocumented instructions, which are valid instructions that the CPU can actually execute, but which don't appear in the CPU's documentation. These may be instructions which were deliberately added as part of the design but which didn't make it into the documentation for whatever reason, or they may be an unintended consequence of other aspects of the CPU's design. (The 6502 famously has a lot of undocumented opcodes of the second kind, see http://www.pagetable.com/?p=39 for more info on those.)


I think based on some of the other comments you can surmise that illegal doesn't mean unlawful in this instance but mearly out of spec.


every instruction set architecture has an encoding format and rules for how to encode every instruction, register, and memory address. it's usually the assembler's job to ensure this is done correctly and to the spec but there's tons of room for undefined behavior by doing it yourself and using a prefix on an instruction that doesn't support it, or just encoding an opcode that doesn't exist, etc.


I'm not familiar with modern CPUs, but for the 6502 it just means that they aren't documented. This means that they could theoretically change in future versions or other implementations of the processor, that emulators may not emulate them, and that they may not be well tested or have surprising side effects. In the 6502 they are a side effect of how the processor is designed (see the page table.com article linked elsewhere in this thread) but may be able to do more work per clock cycle


They violate the Bogotá Instruction Set Convention of 1983 :)

In all seriousness, if, let's say a MIPS processor had a NAND instruction, that would be illegal because it does not adhere to the MIPS spec


I thought it was the Brussels-Utrecht Limitations and Legalities of System Hardware Instruction Types agreement?


No that was the treaty that ended the Risc/Cisc wars


You're designing a CPU, and you develop a method for how the CPU will decode and execute instructions. You come up with opcodes for all the instructions you want the CPU to use. As a side effect of your decode/execute method, some of the opcodes you didn't specify do other stuff. Maybe they do the same thing as some of the instructions you planned, or maybe they do weird things that seem meaningless and useless.

Either way, those extra opcodes weren't part of your design, and you don't want people using them. Because maybe a future design of your CPU will use those opcodes for something you actually planned, or maybe you'll figure out a more efficient way to decode and execute opcodes which, as a side effect, will change the way those unplanned opcodes work. So you either just don't document those extra opcodes, or you declare them illegal.

Another thing that can happen is that your CPU is buggy, and some of the opcodes don't do what you planned. The Z80 had undocumented (but commonly used) "shift logical left" instructions. These instructions set bit 0 of the operand to 1, which is strange. It's speculated that these were planned to be ordinary shift operations, but the bit 0 to 1 thing was a bug, and so Zilog just decided to leave those instructions undocumented.


I think it's used the same way as in "IllegalArgumentException" in Java. It's not like the feds are coming for them or anything


So thanks to Sandsifter we can fuzz test CPUs. Honest question: what's the implication of these illegal and undocumented instructions discovered by hardware fuzzing? Do we have to worry about new security vulnerabilities because of them?


They found a bunch of instructions, but for most of them we have no idea yet what they do. At least one instruction on some system can cause the system to hang. Some instructions might have interesting behavior on hypervisors. But most likely the vast majority is very boring and does nothing new or interesting.


Not only hang, but hang in ring 3, meaning it could potentially freeze a CPU from a VM / Hypervisor


I bet if there are any useful undocumented instructions you'll start seeing (not very serious) compiler modifications to support them at your own risk.


Why, yes, if you read the full research paper it points out that some classes of instructions have security implications.


Nothing has changed since 30 years ago. Huge part of the 8-bit demo scene grew around exactly 'illegal and undocumented' CPU instructions, allowing kids (at the time) to come up with some really crazy and awesome stuff, given on paper capabilities of hardware like 6500/6502 in C64 and other machines (think opening screen borders, think more than 8 sprites, think more than 256 colors in hi-res and many more).


It's interesting reading how the 6502 has a "Decode ROM PLA" that fires off various parts of the instruction its executing. It's a very primitive microcode. It would be interesting if that was customizable in an FPGA version or if the Javascript version allowed that to be changed.


"primitive microcode" is the wrong way of thinking about it. Microcoded CPUs had been around for ages by that point.

It's a highly optimised microcode, designed to produce the correct control signals in the minimum amount of space while not caring about the output for illegal opcodes.


Many demoscene intros uses these kind of instructions (and also undefined side effects) to optimize or reduce the binaries. When you really know your targeted CPU you can do some nice tricks :)


Call the instruction police.


This post does nothing to add onto the original post about Sandsifter.


Ha, for some reason I was expecting this to be biting satire about immigration.


For a second, I thought this was a post about immigration.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: