In case anyone is wondering what the "hidden instructions" he found are, many of them are documented elsewhere:
0f0d/0-7 were all prefetch instructions, but probably behave like NOPs if not supported
0f18/0-7 are HINT_NOPs
0f{1a-1f} are also HINT_NOPs
0fae is a bunch of assorted instructions (FXSAVE, FXRSTOR, LDMXCSR, etc.)
dbe0 is FNENI
dbe1 is FNDISI
df{c0-c7} are x87 ops
f1 is ICEBP
c0,c1,d0,d1,d2,d3 groups have a few aliases (SAL/SHL)
f6/1 and f/1 are aliases of f6/0 and f7/0
0f0f are 3DNow instructions and it wouldn't surprise me if there were many aliases there
0fa7 was briefly used for the IBTS instruction on the very earliest 386s and then CMPXCHG for the very earliest 486s
(http://datasheets.chipdb.org/Intel/x86/486/Intel486.htm)
perhaps VIA continued to use it for a CMPXCHG alias
IMHO the 1-byte opcode map has basically been completely explored and documented, perhaps with the exception of some of the x87 stuff. It's the 2-byte (0F xx) ones where things start to get really interesting.
I think this is a bit of an oversimplification. I’m seeing some of these appear on sandpile.org, it does look like a lot of them are vestigial/legacy things. But, I think the implied meaning of “documented” for the presentation is “documented by the manufacturer”. Just because some of these have been reverse engineered by others doesn’t really make them “documented”, at least, not in the spirit of what the project is trying to find. He also points out that some of these (0f0d, 0f18, etc) were added to the documentation in the last year - the concern is that they were hidden for years and years before that. 0f0f and 0fa7 look like the most interesting to me - 0f0f is probably 3Dnow, but I can’t find any information on the gaps in the 3Dnow set; he says in the presentation 0fa7 is the via padlock instructions, I can’t find any references on the gaps in that range either.
"If your processor has an errata in it and you update the documentation to allow that errata, is it still an errata? I think it is, but apparently it's allowed in the newest version of the AMD manuals" [37:48]
I don't understand why more vendors don't do this (if anyone wants to comment to this, I would be interested to get another opinion.) While my experience is admittedly limited to obscure chips that require NDAs for access to the specs, I was always a little annoyed that almost every time there was not even a reference to the errata documentation that the vendor provided when a new version of the spec would come out.
Now in AMD's case, I would argue that they should be more clear that it's an errata, and mention that the updated spec differed from previous versions (which it alluded to by saying "This behavior is model-dependent".) Ultimately, the spec is THE document on how a user should expect the chip to behave. So sue me if I am blurring the lines between an errata and a mistake in the spec, but I just want my documentation to tell me what the chip does without having to refer to a dozen other secondary documents dang it!
For that specific case (pagefault vs undefined instruction) I can see why the behaviour difference from the spec, since it's highly dependent on how the processor decodes each instruction; others have noticed similar things:
This kind of technique, and the exploitation of minor CPU errata, can be used to help differentiate processor models and steppings.
That in turn allows a currently widespread DRM system to download personalised portions of object code that rely on properties specific to the licensed hardware in order to execute properly, in an attempt to counter debugging, emulation and transfer - continuing a tradition practised in copy protection techniques since at least the 6502, maybe even earlier.
Can't wait to see what 2017's f00f bug equivalent is after its released. Maybe I should just run his tunneling programs and not wait for the disclosure.
The f00f bug is one that was discovered when cmpxchg didn't do what it was supposed to.
If this just searches the space looking for packets that shouldn't decode but end up getting executed, then it's unlikely to be anywhere as interesting as f00f.
In all likelihood we have already seen 2017's big silicon bug and it was AMD's Ryzen 7 1800X issue.
To summarize; The guy built a random cpu instruction generator for x86. An instruction can be at most 15bytes long. So, the solution space is quite huge. He cut the solution space to 100k by generating them with DFS style fashion and validating them through cpu exceptions and flags. In the end, there's kind of map reduce style distiller to analyse hidden and valid instructions.
What's the bet the really secret instructions are hidden behind special conditional decode logic? I.e. the cpu wont even ask for the next byte if some register value is not set, possibly a secret register that first needs to be set via some other hidden instruction. Make that a sequence of 3 hidden instructions combined with arbitrary register and immediate values, and you won't get close to identifying them in a billion years.
I mean if you worked for Intel and your manager said "make me a really secret instruction" would your best response be "lets just not document it and hopes noone notices"?
What I would give to read the full microcode of the latest Intel processor. I am guessing it is stored in a vault with the real nuclear codes, Alien cadavers and the Holy Grail.
Such decoding logic might actually be detectable by something like differential power analysis, thought it could be excessively difficult if someone really wanted it hidden.
I suspect that really keeping it out of view would also cost both silicon and propagation delays in what would probably be some of the most critical paths, but then I'm not a vlsi engineer, or whatever the correct title would be :)
I'm not really qualified to answer this, but I suspect the instruction decoder(s?) would be decoupled from register state as much as possible (unless x86 is even weirder than I thought).
Given this, I suspect wiring in a path all the way from the relevant versions of the relevant registers might be quite expensive. Plus part of the decode logic now needs to block on a register value - so a timing based attack might find these.
i386 instruction decoding at least partially depend on what descriptor is loaded into (shadow) CS. For example the effects of 0x66 prefix are reversed between 16b and 32b code.