Reverse engineering the Intel 386 processor's register cell

kens · on Nov 9, 2023

Author here if anyone has 386 questions. The 386 is very complicated, probably too much for me to fully reverse engineer. But I find it interesting to look at small circuits on the die.

anyfoo · on Nov 9, 2023

If you can, see if there is any trace of the CR1 control register in the 386. It’s fully marked as reserved, faults on access, and has never been used even with any successors. (Later, CR4 was added, and then for new things it was mostly dedicated MSRs.)

I have been wondering for literally decades now.

The best theory I’ve read so far came from Michal (os2museum) when I talked to him: The 386 was supposed to have an on-die cache, but that was stripped out late. There are surviving pre-release data sheets documenting that. The idea is then that CR1 controlled that cache, but got ripped out with the rest of it.

If that is the case, there likely isn’t an actual CR1 register at all, and only the opcode encoding survived. But if there should be an actual register, that would be massive news.

ajross · on Nov 9, 2023

I doubt any traces would exist at the mask level if that were the case. The decoding is just a 3-bit field in the ModR/M, there have "always" been CR{1,5,6,7} in the encoding, but those don't take transistors to recognize; you only decode what you want to connect.

But the theory sounds totally plausible: they'd have written up the ISA spec long before they started designing the circuits. So if CR1 got dropped early all the existing designs would still be targetting 0/2/3 explicitly.

But FWIW: the "why wasn't it ever used" question is easier to answer: the window was extremely short. The MOVCR instructions only supported 8 32 bit registers, and when the Pentium shipped that was already too small and all future CPU state extensions were done using the new CPUID and MSR mechanisms.

anyfoo · on Nov 10, 2023

They could have started using CR1 instead of introducing CR4, that to me is the mystery. A likely answer is that some software may have been relying on CR1 faulting, though that’s kind of weird.

ajross · on Nov 10, 2023

AMD could have used CR5 instead of CR8 too. Both could (and probably should have) used an MSR for those bits. Numbers are just numbers. I really don't think there's need for much speculation here.

anyfoo · on Nov 10, 2023

I’m not sure why I have to defend my curiosity, but the engineering decision to add CR4, instead of using the existing CR1 which was literally all bits “reserved for future use”, is just interesting to me. Especially because MSRs were also an option. There must have been a reason, however minutious, leading to that, and I’d like to know.

Though it’s mostly just a corollary to the existence of the completely reserved CR1 itself, which was added in the 386 at the same time as CR2 and CR3 (so I don’t think compatibility concerns apply). CR1 is entirely a forbidden zone between the well-defined CR0 and CR2. At the time, we thought that maybe CR1 was supposed to be the next register new control bits would be added into once the adjacent CR0 is full, but then Intel introduced CR4 for that instead, and made the whole CR1 gap more mysterious.

That Intel decided to skip CR1 entirely speaks of a story. Before I knew of the plans of having an on chip cache, this story was a giant mystery. Now that I know of the possibility that it might have been related to the late-scrapped plans of having an on-chip cache on the 386, I think that’s a pretty interesting potential reason already, but I’m not yet satisfied unless confirmed.

But if CR1 actually exists in the 386 in some form other than the encoding, then that's massive news.

colejohnson66 · on Nov 10, 2023

I always assumed CR8 was used because (a) it prevents access outside 64-bit mode, and (b) being the creators of 64-bit mode, there's a guarantee Intel has no plans of using such a register for something else.

ajross · on Nov 13, 2023

To be clear, sure, that's surely the case. The REX encoding added an extra bit to the register field used by the movcr instructions, giving you 16. So... might as well.

Nonetheless the "official" means for adding state to an x86 is an MSR. And in particular there's no good reason for putting this in legacy spaces like CRs: CR8 is an interrupt control register, unrelated to the MMU control in CR0/2/3.

(In fact the proof of this is that within the decade, x86 interrupt handling needed to evolve in the direction of MSI's and x2apic, something that even the newly-expanded CR registers were completely inadequate to do. So we do that stuff in MSRs too.)

colejohnson66 · on Nov 9, 2023

Interesting. Do you have a link to said datasheet?

anyfoo · on Nov 10, 2023

Took me a while, but here you go: https://ardent-tool.com/CPU/docs/Intel/386/datasheets/231052...

accrual · on Nov 9, 2023

Thank you, Ken! This is especially cool to me after reading a recent article [0] posted here on hn [1], titled "Intel 80386, a revolutionary CPU". It gave me a new appreciation for how advanced the 386 was and how it formed the foundation of modern x86 architecture.

[0] https://www.xtof.info/intel80386.html

[1] https://news.ycombinator.com/item?id=38156486

raverbashing · on Nov 9, 2023

Very nice writeup!

I understand that these memory cells needs both bit and #bit, but how are both provided to the cell? Seems like you'd need a lot of inverters around for those

kens · on Nov 9, 2023

The trick is that the drivers that supply bit and #bit are shared by all the registers. At the top of the register file you have 64 drivers for the bit lines and then these signals feed all the cells.

mobilio · on Nov 9, 2023

I always wish to say that to you:

THANK YOU!

bell-cot · on Nov 9, 2023

> Since there are six 16-bit segment registers in the 386, I suspect these are the segment registers and two mystery registers.

I'm no expert...but I'd guess the 2 mystery registers are for scratch use by the microcode. When executing CALL, INT, etc. in protected mode, a '386 chip can easily burn 200-300 clocks on just that one assembly instruction. Lord knows how many corner cases there might be, when various things go wrong mid-instruction.

kens · on Nov 9, 2023

Yes, that's very possible that the mystery registers are scratch registers. But there are lots of other possibilities too.

bell-cot · on Nov 9, 2023

A related idea - if the circuitry around all 8 (16-bit registers) is the same...I'd wonder if the '386 might actually have 8 segment registers - 6 user-accessible, and 2 more reserved for the microcode. The latter mostly used when navigating some of the more "interesting" states transitions which the '386's paging / protection / segmentation architecture allows.

bonzini · on Nov 10, 2023

The microcode hardly needs the segment selector registers. It uses the descriptor cache which includes the base, limit and access rights of the 6 segments + GDTR, IDTR, LDTR and TR. The descriptor cache is update by the MOV to segment and L{GDT,IDT,LDT,TR} instructions.

bonzini · on Nov 9, 2023

I would put my bet on LDTR and TR. They are read and written with different instructions but they are 16 bit and closely related to segment registers (they index into the GDT).

rep_lodsb · on Nov 9, 2023

That seems the most likely to me as well, since they also have a 16 bit selector field that has to be stored.

The 286 already had 8 "segment registers" internally: ES, CS, SS, DS, GDT, LDT, IDT & TSS. Only the first four are directly accessible by the program, but to the microcode they should all be more or less identical.

Besides the layout of the saved CPU state in memory, another clue to this is that for exceptions involving both "normal" and "system" segment registers, an error code gets stored in one of the internal registers, indicating which segment caused it. The only way to read it out in software is by executing the undocumented STOREALL instruction (F1 0F 04) immediately after the reset from a triple-fault shutdown.

The codes that appear in this register are:

     6CFFh GDT
     6DFFh LDT
     6EFFh IDT
     6FFFh TSS
     70FFh ES
     73FFh DS

CS and SS should be 71h / 72h, however exceptions involving these segments seem to take a different microcode path that overwrites that register with the access rights for CS.

I have written about this here: https://rep-lodsb.mataroa.blog/blog/the-286s-internal-regist...

And this article shows the bus cycles done by LOADALL on the 386, note that it also loads 10 internal registers:

http://www.rcollins.org/articles/loadall/tspec_a3_doc.html

bonzini · on Nov 10, 2023

Right, the 386 has 10 internal segment descriptor caches but only 8 segment selector registers. The 286 had 8 and 6 respectively.

But wait, HIMEM.SYS used LOADALL to avoid going into protected mode and back?!? When you thought you knew everything (which I absolutely don't, but I knew it used big real mode on the 386 and I totally didn't expect LOADALL).

> More useful is the ability to load any arbitrary base address for the segment registers without entering protected mode. Some versions of Microsoft's HIMEM.SYS did this to copy data between real and extended memory.

It is also explained at https://www.os2museum.com/wp/himem-sys-unreal-mode-and-loada...

jewillco · on Nov 9, 2023

The descriptor cache is 96 bits per segment register (http://www.rcollins.org/ddj/Aug98/Aug98.html) so the 12 registers might be the bases and limits with the extra metadata stored somewhere else. It might also be that each limit is fewer than 32 bits since the encoding for segment limits uses 21 bits in memory.

bell-cot · on Nov 9, 2023

Typo? I'd guess 's/is a /is more /'.

> physical implementation is a complicated than the theoretical

kens · on Nov 9, 2023

Fixed :-)

forinti · on Nov 9, 2023

And I thought I was on the metal for programming in assembly.

Mikado001 · on Nov 9, 2023

Next step, emulate this processor with a FPGA

kens · on Nov 9, 2023

I think there are a few missing steps there :-)

Bluecobra · on Nov 10, 2023

How about a 486 instead? :)

https://github.com/MiSTer-devel/ao486_MiSTer