Hacker News new | past | comments | ask | show | jobs | submit login
Examining the silicon dies of the Intel 386 processor (righto.com)
280 points by Tomte on Oct 14, 2023 | hide | past | favorite | 62 comments



Author here. I've been looking at the 386 if anyone has questions. This post was inspired by userbinator's discussion on HN a couple of weeks ago about how many transistors there are in the 386.


Great article as always! One nit:

> ..."tapeout", when the chip data is sent on magnetic tape to the mask fabrication company.

That's roughly true in a temporal sense, but it's not where the term "tapeout" comes from. They could have shipped the data on a Winchester disk, and the event would still be called tapeout.

In the earlier days of printed circuit board (PCB) manufacturing, you would literally "tape out" your circuit manually with black tape on a white board, typically in an enlarged form.

"Tapeout" came to mean the point in time when you finished taping out your circuit and it was ready to be sent to be photographed and reduced and boards manufactured from the layout.

There wasn't even any "data" involved here, magnetic or otherwise. Just a physical art board with tape on it.

Wikipedia has a pretty good article on this:

https://en.wikipedia.org/wiki/Tape-out

And for the young'uns who are wondering "what the heck is a Winchester disk?"

https://www.pcmag.com/encyclopedia/term/winchester-disk

I taped out my first printed circuit board as a third-grader sometime around 1960 and shared the story here:

https://news.ycombinator.com/item?id=32116169


Great post. The most interesting thing to me was how historically significant the 386SL turned out to be. I had always mentally slotted it as a cheap cut-down part for the emerging laptop market, but it actually was a relatively sophisticated (3X the transistors!) precursor of the modern SoC.


Level of integration wasnt the thing that made 386SL special, there were 8088/8086 SoCs before it like NEC V40 used in 1987 Zenith Eazy PC or V50 in 1988 Akai S1000 https://en.wikipedia.org/wiki/NEC_V20#Variants_and_successor...

What made 386SL special was introduction of System Management Mode (SMM). Intel sued AMD over Am386 SMM implementation, AMD tried claiming its not really SMM but jut some left over debugging ICE implementation :D https://ir.amd.com/sec-filings/content/0000898430-94-000804/...


Total amateur here: does a 386 have “cleverness” or optimizations or does it just quite literally chug through a stream of instructions, adjusting registers and memory?

I guess by this I am thinking about how newer processors do all kinds of stuff at the microcode level that mean you cannot anticipate precisely what instructions are being executed in what order.


Well, yes and no. The 386 is chugging through the stream of instructions sequentially, unlike modern processors. There are various clever optimizations, though. First, there is some pipelining, so the microcode for the next instruction can execute while the previous one is finishing. Second, the CPU has 16 bytes of prefetching, so instructions are fetched from memory asynchronously from their execution. So you know what instructions are executed and in what order, but the timing is fairly unpredictable.

I should mention that microcode in the 386 is pretty different from micro-ops that modern processors run. They both break machine instructions down into smaller steps. But microcode runs sequentially, while micro-ops are sort of tossed into the CPU and run independently through a dataflow engine, with everything sorted out at the end to look sequential. Confusingly, modern processors use "microcode" to hold the micro-ops for complicated instructions that can't be handled by the regular instruction decoder; this is sort of like old-style microcode, but not exactly.


That's really cool: having learned most of my digital design from an era where interpreted microcode was the norm, and the PDP-11 or early 68k was the pinnacle of hardware design worth teaching, I've always wanted to learn about the transitional designs from old-style microcode, to how shrunk forms of microcoding lived in superscalar CISC designs. (I'm still trying for lack of resources.)


Out of order execution was introduced in the Pentium Pro, the Pentium was already able to execute two instructions per cycle but was still in-order.

Note that this "magic" is mostly implemented not in microcode, but rather in hardwired logic.


386 is actually somewhat slower clock for clock than 286 for same 16bit code. So no, no optimizations. For example https://web.itu.edu.tr/kesgin/mul06/intel/instr/movs.html, 486 was only slightly optimized, it was Pentium where they finally started working on IPC.


Right, the 486 was only slightly optimized for IPC. Most of the 486's performance over the 386DX was from its L1 cache (or just called CPU cache at the time, there weren't different cache levels yet.) You can turn the cache off in a 486's BIOS and it will be barely any faster than a 386 of the same clock speed.


One of the assertions of RISC is that microcode is a performance penalty, to be avoided.

And many have.

"To provide this rich set of instructions, CPUs used microcode to decode the user-visible instruction into a series of internal operations. This microcode represented perhaps 1⁄4 to 1⁄3 of the transistors of the overall design. If... the majority of these opcodes would never be used in practice, then this significant resource was being wasted."

https://en.m.wikipedia.org/wiki/Berkeley_RISC


Every time I see something new on righto.com, I get excited! I get to learn :D

Thank you so much, both for constantly sharing what you know and for preserving what's one of the most interesting times in the evolution of computing.


+1

What I like most about these articles, is that it shows how messy even high-tech can be:

All the details in fabrication techniques (and its effect on what logic designers can/can't do), some opcodes removed to make room on the die for a bugfix (?, see note 26), etc etc. "Ultimately all digital electronics is analog".

The business mistakes, and lucky recoveries.

Keep up the good work, Ken! (btw you typo'd an "8" and "6" in note 25 :-)


Love the post!

Some DOI and Bitsavers links are broken (linking to righto.com or 404s). Also, where can I find "Automatic Place and Route Used on the 80386"? DDG only contains one result: this post.


There was some discussion of the automatic placement in this panel interview.

If I remember correctly, the software that performed the placement was written by a graduate student who debugged it from a terminal at his dormitory. It was one of many project decisions on the i386 that management would have absolutely stopped had they been made aware.

https://www.computerhistory.org/collections/catalog/10270201...

"386 is a complicated processor (by 1980s standards), with 285,000 transistors..."

Interesting that ARM1 was only 25,000 transistors. Did the i386 really have additional features that justify an order of magnitude?

One thing is certain in retrospect: Intel should have bought Acorn, not Olivetti.

Edit: Wow, there is even more detail on the placement software in the Righto article; the software was "Timberwolf" written by Dr. Carl Sechen.

Edit2: It appears that later versions of Timberwolf were sucked into Yale's licensing.

...Sechen served as an expert witness in the Cadence/Avanti trial in 2000 and 2001... "I had a chance to examine a great deal of the code in question when I was an expert witness on the trial. It was amazing – I even saw my own TimberWolf code in their tool, where only a single line of code had been changed. And I don’t mean the earlier, far-inferior version of TimberWolf available from Berkeley. The version I found in Avanti’s suite was a far more state-of-the-art version that had somehow been ‘acquired’ from Yale."

http://www.aycinena.com/index2/index3/archive/uw%20-%20seche...

Edit3: Graywolf is a fork of the last free version of Yale's Timberwolf.

https://github.com/rubund/graywolf


> Did the i386 really have additional features

Here are a few:

- > 26 bit address space

- multiplication in hardware

- more complex instructions

- backwards compatibility with the 80286

- on-chip MMU

- support for a FPU

> that justify an order of magnitude?

I wouldn’t know. Backwards compatibility certainly is high on the list because, when it was released, many users had fairly high investments in commercial software.


Well, 26 bits would have made the 80286 a king. Would that itself have killed ARM?

Of course, SSE/NEON was decades in the future.

As we note, Intel did not value backwards compatibility at the outset of the i386.

Perhaps an Acorn acquisition and the sudden ownership of a low-power CPU that they could make for peanuts might have also had a profound impact.

It would have been interesting to see Intel making BBC Micros.


> - backwards compatibility with the 80286

With the 8086, sure. v86 got you covered.

With 80286? Not really. It might have some of the instructions, but it does not support 286's protected mode.


> Interesting that ARM1 was only 25,000 transistors. Did the i386 really have additional features that justify an order of magnitude?

Feature wise, the ARM1 is probably more comparable to the Motorola 68000 from 1979, both have ~16 32bit registers, no MMU and no instruction prefect queue. The ARM1 does have a full 32bit ALU compared to the 68000's 16 bit ALU, and a full 32bit barrel shifter, compared to the 68000's 1 bit shifter.

But the 68000 is still 68,000 transistors (hence the name). So not only is the ARM1 achieving about the same level of functionality with under half the transistors, but it can execute instructions significations faster.


>interesting that ARM1 was only 25,000 transistors. Did the i386 really have additional features that justify an order of magnitude?

you're asking somebody to answer RISC vs CISC in a subthread? there's not simple answer to that question, but the x86 family had a leg up with MSWindows compatibility and the processors they developed maintained that hegemony till they went astray with Itanium and were saved by amd64


I've been in physical design since 1997. Back then we were using Cadence Gate / Silicon Ensemble.

We tried the various Avanti tools. Aquarius, Astro, Apollo. I don't remember the exact order but all the space theme names are where the MilkyWay database comes from that is still in Synopsys IC Compiler.

I remember that when the court actually compared the source code they found that some of the comments had the same words misspelled in the same way. Two programmers may pick the same variable names but they aren't going to misspell words the same way in comments unless it was stolen code.


Thanks! I've fixed the links, so let me know if you see any other broken ones.

The "Automatic Place and Route Used on the 80386" article is from Intel Technology Journal, Spring 1986, p29-34. I don't think you can find it anywhere; Pat Gelsinger sent me a copy. Email me (ken.shirriff@gmail.com) and I'll send it to you.


You can always upload it to Archive.org if you feel so inclined


Sadly Computerworld 'Intel backs off 80386 claims but denies chip recast needed (1986)' article was devoid of technical details :(

Do you have more detailed information about what were the main issues with multiple OSes in Protected Mode? Was it implementation bug or fundamental architectural problems?

Do you know by any chance why did Intel miss POPF trap in Protected Mode?

https://devblogs.microsoft.com/oldnewthing/20160411-00/?p=93... https://docs.oracle.com/en/virtualization/virtualbox/6.0/adm... https://www.felixcloutier.com/x86/popf:popfd:popfq

Pentium Virtual Mode Extension (VME) and its Protected Mode Virtual Interrupts (PVI) solve performance burden of trapping, but despite being named _Protected Mode_ Virtual Interrupts this works only in V86 mode leaving Protected with this bug:

"The protected-mode virtual-interrupt feature — enabled by setting CR4.PVI — affects the CLI and STI instructions in the same manner as the virtual-8086 mode extensions. POPF, however, is not affected by CR4.PVI"

They even planned to patent PVI this in ~1992 https://patents.google.com/patent/GB2259794A/en and clearly knew about popf pitfalls back then.

Could this be what people in Computerworld article were complaining about? I dont understand how Intel not fixed it at all to this day.

Btw I find it funny and weird that Intel was lawyering around all the way in 1998 trying to suppress any knowledge of Virtual Mode Extension! Dr. Dobb's 'VME: Coming Out of the Cold' https://web.archive.org/web/20001217233100/http://www.rcolli...


Great post! Any idea how the turbo button worked? It seems like, given the transistor difference between the 8086 and 386, that merely decreasing the clock frequency wouldn't be enough?


Yes, I think the turbo button just changes the clock speed. The 386 was designed to be binary compatible with the 8086, but normally ran faster.


the lack-of-turbo button was included to enhance compatibility with older software that using timing tricks that relied on the clockspeeds of the older generations of processors. Those older processors would not have used the enhanced features of the new processors, just the backward compatible features. Turbo was a motherboard OEM feature, not a 386 feature. Otherwise, there was no reason not to run "turbo"


Chipset/motherboard specific. Clock, wait states, cache, later even SMM sleep.


This was often done by introducing wait states, so the processor would slow down while accessing main memory.


I did some sleuthing to try to figure out what part of computing history this could refer to.

- Maybe because one could lower the "wait states" BIOS setting on PCs, and this became a selling point and was marketed? https://retrocomputing.stackexchange.com/questions/9779 https://retrocomputing.stackexchange.com/questions/18333

Note: it is often perceived that the CPU with more "wait states" was slower. But the above links point out how often the opposite is true.

- The Apple II era was a very different world. The CPU was relatively slower than the RAM. https://retrocomputing.stackexchange.com/questions/23541


About what rough area percentage (or some other metric if you think there's a clearer comparison) of a 386DX is standard cells versus hand layout?


Looking at the die, I'd say roughly 1/3 of the die is standard cell. I think some of it was standard cell but with hand layout, rather than automatic place and route. About 1/4 is the datapath, which is highly optimized. Maybe 10% is the microcode ROM.


Thanks for getting to the bottom of the doubt!

The SL die photo seems to really show the differences in density that careful layout can produce; one wouldn't think that bus/memory controllers are of the same complexity as the CPU itself, but due to being entirely standard cells, they are almost the same size as the CPU.


My dad worked on that processor, among others. I see his KF initials in that die photo! :)


How were those processors made in a day-to-day basis? What did a typical Intel workspace look like back in the day?


I remember they had cubicles. Relatively glamorous compared to entirely open and echoey offices.


Ah, the days of cubicles. Trimble Nav in Sunnyvale had medium-height, solid fabric-backed 6x8' and 8x8' cubicles in 2000. And it was nice to be tucked in a quiet corner of the building by the foosball table room before there was such thing as startup culture. Was almost detained by SGI security by Shoreline Amphitheater (near the 'plex now) doing field radio testing off a coworker's truck that looked like Van Eck phreaking equipment. I should've worn a hi-vis vest. ;D

PS: Raise a paw if you remember the rainbow Apple logo on the triangle building along 280.


What did the design process actually look like? Did they design at the gate level? Drawing manually or via software?


This is partially answered in the artice when the author discusses the creation of the 386. Specifically, footnote #23 calls out his sources.


The author even mentioned him in the notes. Do you know any of the other designers names?


I updated the article with Brad's info :)


My dad says he does. I've connected him with Ken.


One of my greatest treasures as a young computer nerd was a bare 386(I think) chip that I received after sending away for it via an Intel ad I found in Byte magazine. I just had to cut out part of the page and mail it in. Several months later I get a package back with the naked processor glued to stiff card along with a low powered magnifying glass to scope it out with.

I really wish I still had that thing.


I'm always interested in all things (80)386 because this was basically the processor that launched the 32-bit computing revolution, at least as far as popular adoption of computers based on this processor go (there were earlier 32-bit processors -- but no earlier processor became as commercially popular (or as adopted by the mass population) as much as the (80)386).

So this processor is of particular interest to me -- and should be to future computing historians...

This is truly a great article about that processor!

It's information rich (I have not seen a more information rich source about the 386 on the entire Internet, except for perhaps 386 technical manuals and manual fragments, but those documents lack general human readability), and it is of great value to anyone who wishes to study the 386, and it will be of great value to future computer historians...

So, well done!

Upvoted and favorited!


Thanks for your kind comments!

Personally, I'd say that the IBM System/360 (1964) was the first widespread and influential 32-bit architecture. The Motorola 68000 (1979) also deserves a mention for its use in the Macintosh. (And I'll argue with anyone who says it wasn't a real 32-bit processor :-) But, yes, the 386 started the 32-bit x86 architecture on most (non-phone) computers today.


Excellent points!

System/360 is indeed interesting because (if my memory serves me) it was one of the first computational architectures to implement microcode (also, if I recall correctly, the necessity of updating of this microcode was one of the reasons that floppy drives were invented...)

I'm also a fan of the Motorola 68000 -- I used to have an Amiga 1000 "back in the day". I'd choose it over any 8 or 16 bit CPU of the time period, but (correct me if I am wrong) it didn't have an MMU -- which would have made it a less-than-ideal candidate for writing a modern-day Unix compatible operating system, although the authors of AmigaOS managed to pull off quite an impressive multitasking OS on it, despite this fact, nonetheless...

Also, as a 32-bit architecture (as opposed to discrete single-package IC CPU) we'd probably additionally want to remember the VAX 11/780 (1977) -- whose CPU was implemented as circuits of multiple simpler TTL IC's...

(Oh sure, IBM might have done something like that earlier -- but the VAX 11/780 brought down the cost of IBM's comparable computing offerings by at least one order of magnitude! -- Although, even so, the 11/780 still would have been ridiculously expensive to the average person of that time period... (and yes, I know it was intended for mid-sized to large businesses, not people! :-))

But anyway, great article, and yes, the IBM System/360 and 68000 were indeed groundbreaking!


>it didn't have an MMU

It didn't, but the real issue wasn't that it didn't have one.

It was that it didn't support one. Specifically, the stack frame the 68000 pushes on bus error lacks the information to allow recovery (retrying).

68010 solved this. Many UNIX workstations are based on the 010, some use Motorola's MMU chip, most use custom designs.


> "If management had known that we were using a tool by some grad student as the key part of the methodology, they would never have let us use it."

This is why managers shouldn't micromanage technical decisions.


Today I learned that the "s" in SX means "single" and "d" in DX means double. The DX has double the data bus width of the SX (32-bit vs 16-bit).


But then the 486 DX has a floating-point unit, while the 486 SX does not. So Intel just went with DX is better than SX.


There were rumours that the DX and SX chips were exactly the same:

Intel zapped the FPU on the SX to disable it. Apparently it cost more to make, but sold for less.


The (80-bit) floating point in historic Intel processors is deprecated, FYI.

After the exchange of MMX and 3d-NOW, it was AMD that adopted Intel SSE into amd64.

https://en.m.wikipedia.org/wiki/Streaming_SIMD_Extensions


The 386sx was my first overclock switching time crystals out on the board.


reminds me of the days computers had a "turbo" button and a LCD read out of the clock speed on the case. "66Mhz" => "90" ... now we're cooking


My 20/40MHz had an LED display. When I opened the case, I found the display had jumpers to make every combination of leds possible, even non numeric. A bag of jumpers was taped next to it. I'd put HI/LO on it, or 01/99, or make the turbo 20Mhz and the slow mode 40.


wasn't the page mode mmu the most important new feature? iirc this allowed for fully protected virtual memory, even for legacy applications.


I'd say equally important to 32bitness, both were needed to run a proper BSD4.x/SystemV Unix (or clone ;) )

(The 286 was enough for a PDP-11 style UNIX, and the 8088 could run a hobbyest-level one OK.)


> The 286 was enough for a PDP-11 style UNIX,

i remember MWC coherent well! it was where i was first learned unix. :)

(except for maybe some vague memories of some unix like os for the atari st that i didn't really understand at the time, but those memories are extremely vague)


It's interesting that if the 386SX wasn't much simpler electronically than the 386DX, but mostly in packaging, that there was never a "386DL" for cost-no-object laptops.

I guess its market window might have been too narrow. It seems like if you wanted the performance back then, you might even go without a battery, so using a desktop CPU without the power-management special tricks is no big deal.


Could the “C” in 80C386I stand for CMOS? Wasn’t that pattern used for the 80C88, for example?


All Intel 386s were CMOS though, so the added C on the die-shrink shouldn't mean CMOS.


More like compact.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: