Breaking the x86 Instruction Set: how to find undocumented x86 instructions

userbinator · on Sept 11, 2017

In case anyone is wondering what the "hidden instructions" he found are, many of them are documented elsewhere:

    0f0d/0-7 were all prefetch instructions, but probably behave like NOPs if not supported
    0f18/0-7 are HINT_NOPs
    0f{1a-1f} are also HINT_NOPs
    0fae is a bunch of assorted instructions (FXSAVE, FXRSTOR, LDMXCSR, etc.)
    dbe0 is FNENI
    dbe1 is FNDISI
    df{c0-c7} are x87 ops
    f1 is ICEBP
    c0,c1,d0,d1,d2,d3 groups have a few aliases (SAL/SHL)
    f6/1 and f/1 are aliases of f6/0 and f7/0
    0f0f are 3DNow instructions and it wouldn't surprise me if there were many aliases there
    0fa7 was briefly used for the IBTS instruction on the very earliest 386s and then CMPXCHG for the very earliest 486s
        (http://datasheets.chipdb.org/Intel/x86/486/Intel486.htm)
        perhaps VIA continued to use it for a CMPXCHG alias

IMHO the 1-byte opcode map has basically been completely explored and documented, perhaps with the exception of some of the x87 stuff. It's the 2-byte (0F xx) ones where things start to get really interesting.

twiddlydee · on Sept 11, 2017

I think this is a bit of an oversimplification. I’m seeing some of these appear on sandpile.org, it does look like a lot of them are vestigial/legacy things. But, I think the implied meaning of “documented” for the presentation is “documented by the manufacturer”. Just because some of these have been reverse engineered by others doesn’t really make them “documented”, at least, not in the spirit of what the project is trying to find. He also points out that some of these (0f0d, 0f18, etc) were added to the documentation in the last year - the concern is that they were hidden for years and years before that. 0f0f and 0fa7 look like the most interesting to me - 0f0f is probably 3Dnow, but I can’t find any information on the gaps in the 3Dnow set; he says in the presentation 0fa7 is the via padlock instructions, I can’t find any references on the gaps in that range either.

userbinator · on Sept 11, 2017

http://linux.via.com.tw/support/beginDownload.action?eleid=1...

Interesting. 0fa7xx is indeed where the VIA Padlock instructions live, but the last byte is probably being partially decoded.

Lramseyer · on Sept 11, 2017

"If your processor has an errata in it and you update the documentation to allow that errata, is it still an errata? I think it is, but apparently it's allowed in the newest version of the AMD manuals" [37:48]

I don't understand why more vendors don't do this (if anyone wants to comment to this, I would be interested to get another opinion.) While my experience is admittedly limited to obscure chips that require NDAs for access to the specs, I was always a little annoyed that almost every time there was not even a reference to the errata documentation that the vendor provided when a new version of the spec would come out.

Now in AMD's case, I would argue that they should be more clear that it's an errata, and mention that the updated spec differed from previous versions (which it alluded to by saying "This behavior is model-dependent".) Ultimately, the spec is THE document on how a user should expect the chip to behave. So sue me if I am blurring the lines between an errata and a mistake in the spec, but I just want my documentation to tell me what the chip does without having to refer to a dozen other secondary documents dang it!

userbinator · on Sept 11, 2017

For that specific case (pagefault vs undefined instruction) I can see why the behaviour difference from the spec, since it's highly dependent on how the processor decodes each instruction; others have noticed similar things:

https://www.symantec.com/connect/blogs/x86-fetch-decode-anom...

jacksonR · on Sept 10, 2017

Ah, this is the talk behind the sandsifter tool that was making its round a few weeks back. Nice to get the deeper picture.

github: https://github.com/xoreaxeaxeax/sandsifter white paper: https://github.com/xoreaxeaxeax/sandsifter/blob/master/refer...

j_s · on Sept 11, 2017

Sandsifter: find undocumented instructions and bugs on x86 CPU | https://news.ycombinator.com/item?id=14872418 (91 comments, July 2017)

xk98qB · on Sept 10, 2017

heh, by the same guy who made that compiler that translates C into only mov instructions: https://github.com/xoreaxeaxeax/movfuscator

im3w1l · on Sept 10, 2017

Obvious question: Are there programs with these instructions in the wild?

AlyssaRowan · on Sept 11, 2017

Yes.

This kind of technique, and the exploitation of minor CPU errata, can be used to help differentiate processor models and steppings.

That in turn allows a currently widespread DRM system to download personalised portions of object code that rely on properties specific to the licensed hardware in order to execute properly, in an attempt to counter debugging, emulation and transfer - continuing a tradition practised in copy protection techniques since at least the 6502, maybe even earlier.

amelius · on Sept 11, 2017

Is there a term for that, similar to "security by obscurity"?

Open-Sourcery · on Sept 12, 2017

how does "Identification by exploitation" sound.

inetknght · on Sept 10, 2017

I would love to know the answer to that. Something tells me that we need to fix disassemblers before we can answer it though.

k__ · on Sept 10, 2017

Do they have any benefit over the valid instructions?

tyingq · on Sept 11, 2017

Has happened in the past...

http://www.rcollins.org/secrets/opcodes/SALC.html

dcomp · on Sept 10, 2017

Can't wait to see what 2017's f00f bug equivalent is after its released. Maybe I should just run his tunneling programs and not wait for the disclosure.

anfractuosity · on Sept 10, 2017

He did seem to indicate that was on an esoteric processor though so maybe it's not on an Intel/AMD chip.

I don't know much about CPU internals, but would it actually be possible to 'patch' that through updated microcode?

It's such a clever program, will be intrigued to see what else it can find!

wyldfire · on Sept 11, 2017

The f00f bug is one that was discovered when cmpxchg didn't do what it was supposed to.

If this just searches the space looking for packets that shouldn't decode but end up getting executed, then it's unlikely to be anywhere as interesting as f00f.

In all likelihood we have already seen 2017's big silicon bug and it was AMD's Ryzen 7 1800X issue.

twiddlydee · on Sept 11, 2017

He says at the end he found a new f00f bug though

m00dy · on Sept 10, 2017

To summarize; The guy built a random cpu instruction generator for x86. An instruction can be at most 15bytes long. So, the solution space is quite huge. He cut the solution space to 100k by generating them with DFS style fashion and validating them through cpu exceptions and flags. In the end, there's kind of map reduce style distiller to analyse hidden and valid instructions.

Nice job though

smegel · on Sept 11, 2017

What's the bet the really secret instructions are hidden behind special conditional decode logic? I.e. the cpu wont even ask for the next byte if some register value is not set, possibly a secret register that first needs to be set via some other hidden instruction. Make that a sequence of 3 hidden instructions combined with arbitrary register and immediate values, and you won't get close to identifying them in a billion years.

I mean if you worked for Intel and your manager said "make me a really secret instruction" would your best response be "lets just not document it and hopes noone notices"?

What I would give to read the full microcode of the latest Intel processor. I am guessing it is stored in a vault with the real nuclear codes, Alien cadavers and the Holy Grail.

vardump · on Sept 11, 2017

You don't really need even that.

You just need a set of magic register values, like how CPUID [0] instruction already works.

[0]: http://www.sandpile.org/x86/cpuid.htm

jevinskie · on Sept 11, 2017

Yup, this was done in this years USENIX with AMD microcode. See the exploits that check for magic register values at [0] and the paper at [1].

[0]: https://github.com/RUB-SysSec/Microcode/tree/master/updates

[1]: http://syssec.rub.de/research/publications/microcode-reversi...

micheljones · on Sept 11, 2017

There is so many ways to implement backdoors in CPUs, even completely analog ones:

https://hackaday.com/2017/04/25/an-analog-charge-pump-fabric...

mickronome · on Sept 11, 2017

Such decoding logic might actually be detectable by something like differential power analysis, thought it could be excessively difficult if someone really wanted it hidden.

I suspect that really keeping it out of view would also cost both silicon and propagation delays in what would probably be some of the most critical paths, but then I'm not a vlsi engineer, or whatever the correct title would be :)

smegel · on Sept 11, 2017

When you say might...are there any case studies showing how the internals of a CPU can be exposed using this technique?

greenpenguin · on Sept 11, 2017

I'm not really qualified to answer this, but I suspect the instruction decoder(s?) would be decoupled from register state as much as possible (unless x86 is even weirder than I thought).

Given this, I suspect wiring in a path all the way from the relevant versions of the relevant registers might be quite expensive. Plus part of the decode logic now needs to block on a register value - so a timing based attack might find these.

More qualified comments welcome...

dfox · on Sept 11, 2017

i386 instruction decoding at least partially depend on what descriptor is loaded into (shadow) CS. For example the effects of 0x66 prefix are reversed between 16b and 32b code.