Hacker News new | past | comments | ask | show | jobs | submit login
Reverse-engineering the ModR/M addressing microcode in the Intel 8086 processor (righto.com)
79 points by picture on Feb 27, 2023 | hide | past | favorite | 22 comments



This also explains how the otherwise undefined "LEA reg, reg" works --- I believe on 186+ it's a #UD, but on the 8086 it just puts the result of the last address calculation into the destination, itself having not done any itself (whereas in the valid case, it always would.)

There was some discussion about that here in the past, not surprisingly on another one of Ken's 8086 RE articles: https://news.ycombinator.com/item?id=24248773

There are other notable undefined ModR/M combinations here, which the behaviour thereof can hopefully now be answered: https://retrocomputing.stackexchange.com/questions/21747/und...


I took a closer look. As you suggested, LEA reg1,reg2 will simply run the micro-instruction "IND→N RNI" without doing any effective address calculation first. Whatever address is in the IND (Indirect) register will be written to the register specified in the reg field.

As for JMP FAR rm with a register, what I seen in the microdcode doesn't quite match the linked discussion. The microcode is

  IND → tmpC      INC2 tmpC   JMP FAR rm
  Σ → IND         SUSP  
  tmpB → PC       R DS,P0  
  OPR → CS        FLUSH RNI  
So it will add 2 to the hidden IND register, read the segment value from that address, and put that in the CS register. Meanwhile, the PC gets the value from the hidden ALU temporary B register.

As for the mystery FF FF instruction, I think it jumps into "PUSH rm" and ends up pushing DI.

  SP → tmpA       DEC2 tmpA   PUSH rm
  Σ → IND           
  Σ → SP            
  M → OPR         W SS,P0 RNI


As for the mystery FF FF instruction, I think it jumps into "PUSH rm" and ends up pushing DI.

In other words, it seems that FF/6 (the normal "PUSH rm") and FF/7 (officially undefined) are aliases? The other notable anomaly is that 8F/1-7 are also officially undefined, while "POP rm" is at 8F/0.

Ignoring the 0F-prefixed ones (POP CS, everyone knows that one by now), there are some interesting gaps in the group table:

https://www.sandpile.org/x86/opc_grp.htm

Perhaps all those undefined opcodes could be worth writing another detailed article about, now that you've become familiar with how the instruction decoding works.


One thing worth noting here is the general simplicity and orthogonality of the addressing. x86 gets a lot of CISC hate for having complicated addressing modes, but if you accept that you're going to have the concept, this is actually pretty clean. You have a clean set of 8 registers in both 8 and 16 bit modes[1], your "pointer" register for a memory operation can be one of the "base" or "index" registers, or a sum of one from each category, and you can have an immediate added.

Writing assembly for the 8086 was actually quite pleasant, the space of stuff you could do was broad and the expression was clean.

[1] Though not the same registers. The list of registers is populated differently for 8 bit and 16 bit instructions. The 386 then played the same trick AGAIN, but placed the selector bit in the code segment descriptor and did a weird dance with how to encode the ESP register (to flag a new "SIB" byte for extended modes), and made a giant hash of things that no one could understand. Writing assembly for the 80386 was definitely not pleasant, though it was actually an easier target for compilers.


The 386 ModR/M isn't that hard to figure out --- it's just that the 8 combinations of addressing modes become [r32], [imm32] is still where [ebp] would otherwise be (0r5), and the [esp] position (xr4) is where they put the SIB.

On the other hand, the 64-bit addressing modes that AMD created with the 64-bit extension are a lot weirder.


The only change done by AMD to the addressing modes was that from two possible encodings accepted by 386 for absolute addressing, the shortest one is redefined to mean PC- (a.k.a. IP-) relative addressing.

The only weird part is that AMD has decided for some very stupid reason to not decode 2 of the 3 bits provided for register numbers in the REX instruction prefix.

Perhaps this AMD decision allowed them to obtain an extra 0.1 GHz in the clock frequency of the 2003 model of Opteron, but it has complicated forever the optimal register allocation on the Intel/AMD CPUs.

While on a 386, with respect to memory addressing (when segmentation is not used) there are only 3 kinds of general-purpose registers (SP, BP and all others), due to this AMD mistake in 64-bit mode there are 5 kinds of general-purpose registers (SP, BP, R12, R13 and all others), where each of the 5 kinds of registers has a slightly different behavior, e.g. resulting in different program sizes, depending on how the registers are allocated.


It's not so much that it's hard to figure out, it's that it's hard for a Regular Jane assembly programmer to know how to do well. There's an embarassment of modes with complicated performance characteristics, and a lot of decisions to be made.

So yeah, serious domain experts and modern optimizers can do really well with the architecture, but it's a huge pain to try to "get stuff done" in, and it's from an era where serious Stuff was still gotten done in hand-coded assembly as often as not.

Basically: with a Z80 or 6502 or 8086, if you had a task "X" to do the "best" way was as often as not the obvious way. With the 386, that was suddenly no longer true and you had to have a bookshelf full of Abrash tomes and whatnot. It was a very different feel.


> People often ask if microcode could be updated on the 8086. Microcode was hardcoded into the ROM, so it could not be changed. This became a big problem for Intel with the famous Pentium floating-point division bug. The Pentium chip turned out to have a bug that resulted in rare but serious errors when dividing.

Do we really know whether fixing the Pentium FDIV bug would have actually been in the reach of microcode patching capabilities? Based on googling the FDIV bug is due to errors in some sort of PLA which presumably couldn't have been patched.


The Pentium FDIV algorithm could have been changed in microcode Newton-Raphelson to sidestep the issue. Depends on exactly what resources they would have in microcode.


Author here. Any questions about 8086 microcode?


What happens if you reference a segment greater than 4 in the ModR/M byte? For example, MOV Ew, Sw [8C] with "reg==5". Does the logic just ignore the upper bit?

On an 80186+, you'd get a #UD, but the 8086 doesn't have strictly "undefined" opcodes; everything does something.


I'm pretty sure that the extra bit will get ignored. There is internal circuitry that expands the register number in the instruction to a 5-bit register number. (You need 5 bits to handle 8-bit registers, 16-bit registers, segment registers, and internal registers.) For segment registers, it drops the third bit. But I haven't tested this case, so I can't guarantee there isn't some weird side-effect.


When accessing memory, is there also microcode dealing with generating a pagefault when the address is invalid?


The 8086 doesn't have virtual memory, so there is no such thing as a pagefault or invalid address. For a memory access, the microcode blocks until the bus interface circuitry completes the memory access. If there's physically nothing there, the memory access will either hang or come back with random garbage.


Don't believe there's an acknowledgement in the bus protocol. There's a RDY line with which a device can insert wait states before replying, and a HOLD that can be used for DMA from other devices. But there's no way for the CPU itself to detect the "there is physically nothing there" case. You indeed just get the arbitrary state of the data bus (the ISA bus was pull-up, I believe, so you'd get 1's on the data bus, which on the PC would then result in a parity error if the address was within RAM space).

(And yes, I'm still waiting with bated breath to see the teardown of the machine that implements this ridiculous bus.)


I was thinking the the READY line might cause problems. If nothing pulls it high, the memory access will be blocked forever. But I haven't looked much at how systems physically implement the bus, so maybe it defaults high.


Wouldn't the READY line get pulled up eventually when something else (a device, or possibly DRAM refresh?) uses the bus?


I got curious and went to check the schematics. On the original PC, the READY input to the 8088 is the synchronized version of the RDY input to the 8284 clock generator chip (which probably explains why Ken and I were spelling it differently). This is generated by a latch delay circuit automatically based on the IOCHRDY signal on the 8 bit ISA bus (and some other local stuff on the board), which itself is an open collector line pulled high.

So indeed, if nothing does anything, #READY will go active automatically, simultaneously with the data lines (not!) being driven, and the CPU will sample junk.

Which is to say: IBM had to add a bunch of hardware to the board just to synthesize a default/ignore behavior for this line, which wouldn't have been necessary had Intel done it right the first time.


Just curious, is there an unintended condition, a combination of commands to put the CPU into an undefined state?


Well, there are opcodes that aren't documented instructions. The 8086 doesn't trap on bad opcodes, so it will execute them and do something. I'm not sure if that's what you mean by "an undefined state". I think the undocumented 8086 instructions are all deterministic, as opposed to the 6502 where the results can depend on bus fluctuations.


On 6502 some of the undefined opcodes (usually named KIL) put the chip into an state where it no longer executes any instructions and doesn't respond to interrupts either. And such state isn't reachable via documented instructions. Maybe that kind of 'state' was meant?


If I remember correctly, that's because the decode PLA which takes opcode + current cycle number (one-hot signal) has a bit that indicates "last cycle" which resets the logic to check for interrupts and begin a new instruction cycle, and for the unused entries it isn't set in any cycle, so the cycle counter "runs off the end" and there's no more active bit to continue execution.

On the 8086, the microcode counter would eventually roll over and reach a "next instruction" bit even if it went into a normally unreachable part, so I don't think that such a state is possible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: