How the 8086 processor handles power and clock internally

kens · on Aug 15, 2020

Author here if anyone has questions about the 8086 internals.

tpmx · on Aug 15, 2020

(Not a question, just some thanks.)

The 8086 was the first CPU I did assembly-level coding on as a kid back in 1989. This was a full 11 years after its 1978 introduction, but PCs were really expensive back then. This was the time when 8086, 80286 and 80386 systems were all sold in parallel, at wildly different price points. The 80486 was just being introduced, too.

It's just fascinating to see what I then imagined as an immense, hyper-complex machine being reduced to a grid of transistors that sort of fit, individually visible on my 4k screen.

bogomipz · on Aug 15, 2020

Another great article. In some photos we see small circles labeled as "vias" and other's show small circles as "contacts." What exactly is the difference between vias and contacts?

kens · on Aug 15, 2020

No difference except how much space I had on the diagram :-)

tboerstad · on Aug 15, 2020

Thank you for the write up! I always find chip images fascinating, and especially older nodes where it’s possible to understand the whole ting if you wanted to.

kens · on Aug 15, 2020

You've probably seen https://zeptobars.com, but if not, there are lots of nice chip images there.

jhallenworld · on Aug 15, 2020

Wires on a chip are interesting. On a PCB you mostly have just two cases to deal with: low edge rate case, the trace is equivalent to a capacitor. Fast edge rate case, the trace is equivalent to a transmission line with a characteristic impedance, such as a coax cable- signals will bounce and you have to worry about termination. In both of these cases you can pretty much assume that resistance is zero (except for very high edge rates).

Now on a chip the traces are so thin that resistance is significant and can not be ignored. The trace can be modeled as a distributed or lumped RC circuit. A consequence is that the delay is a quadratic function of its length (doubling the length of the wire quadruples its delay). It becomes worthwhile to add repeaters. I wonder if these show up in 8086..

On the other hand, "For on-chip wires with a maximum length of 1 cm, one should only worry about transmission line effects when tr < 150 psec"

http://bwrcs.eecs.berkeley.edu/Classes/icdesign/ee141_f01/No...

Taniwha · on Aug 16, 2020

Well yes and no - at the time the 8086 was designed gate capacitance dominated over wire RC delays - early design tools (sort of pre-late 90s, way after the original 8086) didn't bother with calculating RC delays during synthesis and we only really dealt with them in late static timing analysis (and really then only for a few long lines).

Essentially as chip features got smaller wire resistance didn't scale the same as gate capacitance (partly it's edge effects) and our tools needed to change as RC delays started to dominate

bonzini · on Aug 15, 2020

> This chip divided its input clock by 3 to generate the 33% duty cycle clock required by the 8086.

And the input clock was 4.77 MHz because that is one third of the NTSC clock, 14.318 MHz. The clock is further divided by 4 and that's the input of the 8253 programmable timer.

segfaultbuserr · on Aug 15, 2020

> 4.77 MHz because that is one third of the NTSC clock, 14.318 MHz.

And a clock frequency related to NTSC was used here, because it allowed the use of the cheapest quartz crystal in mass production, and also allowing systems to use same 14.318 MHz clock to drive a potential video display.

jhallenworld · on Aug 15, 2020

The 33% is only required if you want to run the clock at its maximum speed. If you run it at a lower speed, you can get away with 50% duty cycle and dispense with the 8284A clock generator chip. I've done this on some cheap embedded systems.. up to 4.2 MHz on a 5 MHz 8088 is OK..

kens · on Aug 15, 2020

That's interesting. The datasheet says the clock has to be 2/3 low and 1/3 high with just 17 ns of margin. It's surprising that the datasheet is so strict if you can totally violate the specification and still have it work.

(To be clear, I'm not disagreeing with you that this works.)

jhallenworld · on Aug 16, 2020

Well the spec is minimum cycle time = 200 ns, maximum cycle time 500 ns. But also it has min. low time = 118 ns and min. high time = 69 ns, which is how the 1/3 duty cycle shows up. But there is no spec violation with, for example, low time = 125 ns, high time = 125 ns for a cycle time of 250 ns (4 MHz).

kens · on Aug 16, 2020

Which spec are you looking at? Is it a later revision? I'm looking at the User's Manual [1] which says the CLK Low Time has a minimum of (2/3 cycle time)-15 and CLK High Time has a minimum of (1/3 cycle time)+2. That yields a minimum low time of 151 ns which is violated by the 125 ns value.

[1] page B-18 in http://www.bitsavers.org/components/intel/_dataBooks/1981_iA...

jhallenworld · on Aug 16, 2020

This one, page 15: https://course.ece.cmu.edu/~ece740/f11/lib/exe/fetch.php?med...

Ah, I see in the older datasheet it says TCLCH = 2/3 * TCLCL - 15. But I'm pretty sure they mean TCLCH = 2/3 * TCLCLmin - 15 which gives the 118 ns it has in the newer datasheet.

kens · on Aug 16, 2020

Thanks! That makes more sense.

supernova87a · on Aug 15, 2020

I've loved following this series of posts as an amateur/layman wanting to know more about the hardware behind the chips I use!

Here's a question for anyone who might have an interest to help me understand (at a high summary level) --

Steady DC power is being provided to the chip, and the internal clock is going at 5 Mhz. What are the input and output signals like? Are they on the order of a few kHz and in a long stream of data (in a short burst?) that the CPU then takes and works on until the "delivery" burst? Is this the FSB? How many of the 45 wires coming off the CPU are for the input/output signals, or what are the rest of them for?

Thanks!

kens · on Aug 16, 2020

The 8086 has 40 external pins [1]. There are 20 address lines shared with 16 data lines, so it's a 16-bit computer that can address 1 megabyte. The remaining pins are mostly assorted control signals. (Intel liked to keep the pin count pointlessly low, so the pins are multiplexed and have several uses.)

The external pins are switching at the 5 MHz clock speed, and there are four clock periods for one memory bus cycle. The address pins send out the address the first cycle, wait a cycle for memory to respond, and read or write the data the third cycle.

The CPU and the memory bus are working at the same clock speed. (Although somewhat decoupled because the 8086 had prefetching.) I think you're going the wrong direction with kHz I/O and delivery bursts.

I've simplified things a bit. The 8086 User's Manual [2] explains all the signals in great detail if you really want to know.

[1] I see that you carefully counted the 45 wires off the die. The power and two grounds each use two bond wires in parallel for more current. There are two wires to bias the substrate. So that's the "extra" 5 wires.

[2] http://www.bitsavers.org/components/intel/_dataBooks/1981_iA...

supernova87a · on Aug 16, 2020

Thanks! I will have a read through that manual!

panpanna · on Aug 16, 2020

This is so simple and beautiful.

It is mainly because of lack of power saving and EMC requirements plus no clock skew problems at such low frequencies.

Nowadays you need a team of 50 people just for the clock tree.

miohtama · on Aug 15, 2020

Also I find the following very intereating. Because no CADs existed, old CPUs were drawn with pen and paper. The process of getting it to lithography was called Rubylith.

https://en.m.wikipedia.org/wiki/Rubylith

Here are some historic photos with master prints of CPU

https://www.team6502.org/

segfaultbuserr · on Aug 15, 2020

The origin of the term "tape-out".

Circuit boards were also designed in this way.

https://www.eetimes.com/how-it-was-pcb-layout-from-rubylith-...

Taniwha · on Aug 16, 2020

Yup, I did this as a kid in the mid 70s - PCBs were done at 2x, red and blue tape for 2 layers - you drilled holes yourself, no plated thru holes

egsmi · on Aug 15, 2020

This is the one I personally have been waiting for. Thanks!

On the power distribution did you notice any bypassing? There are a number of ways to do it now. I have no idea how they did it, if at all, in the 70s.

Also, did you notice any trim or anything on the clock driver so they could match phase of clock and not clock?

kens · on Aug 15, 2020

There's no power bypassing on the chip, just the power lines.

There's no clock trimming either. The clock circuitry ensures that there is a gap between the two phases, but there's no adjustment. But at 10 MHz, the clock is not too sensitive.

egsmi · on Aug 15, 2020

That’s really interesting. It seems like a foreign world compared to what I’m used to. :)

kens · on Aug 15, 2020

Do you want to say a bit about bypassing and trimming on modern chips?

egsmi · on Aug 15, 2020

It's a lot of information for a forum comment. I will provide a pointer though.

I think the book at cmosvlsi.com is pretty good for an introduction to the realities of modern digital IC design.

For more information on the topic of this thread in particular see slides 17-19 in this deck. I was asking if the cap on the far right of slide 17 existed in this design. http://pages.hmc.edu/harris/cmosvlsi/4e/lect/lect21.pdf

On the trim: When I looked at the two phase timing figure provided I noticed if the bottom path through the clock driver was slower than the top path, due to manufacturing tolerance, then that could cause skew (see slide 24 in the deck I pointed to) between the phases which might cause phase 1 and phase 2 to get too close, or even overlap. The clock driver circuit looked pretty regular in the layout so I was guessing they might have redundant parallel drivers that could be enabled or disabled by the ROM bits which could change the relative strengths of the two paths, after manufacture. This would allow them to recover the part and improve yield. But I guess edge rates and periods were slow enough then that this was not a concern and just relative sizing was enough.

segfaultbuserr · on Aug 15, 2020

I recently read the book Principles of Power Integrity for PDN Design and it talks about the same problem from a perspective of a board designer. The authors mention that in the vast majority of chips, due to the bond wire inductance alone, board-level bypass is useless at frequencies above 100 MHz. The only solution here is adding on-die capacitance, otherwise it's not possible for a modern chip to operate correctly. In practice, the on-die capacitance is often minimum due to its high cost, only effective at frequencies above 100 MHz, leaving an anti-resonate peak around 100 MHz, thus the problem can never be eliminated (unless a better package or more on-die capacitance is used, which is not implemented due to its costs). And the author showed how you can break almost every power distribution network of a FPGA, by switching logic circuits in code at the exact anti-resonate frequency where bypassing is least effective. Although it's usually not a problem in practice (until you get "lucky"...).

And contrary to popular beliefs, it's impossible to remove it, no matter how many bypass capacitors or how much buried capacitance exists on the circuit board. In a complex chip like FPGA, PDN is critical. The author suggested that the best thing the board designer can do is a workaround: using bypass capacitors of multiple values to "tune" the PDN - this was done as a rule-of-thumb in the old days, and today it's often seen as a bad practice as it creates multiple uncontrolled resonate and anti-resonate frequency spikes when capacitors are combined with parasitic inductance (similar to the case of the 100 MHz peak, but at board level), so at some range of frequencies, impedance is actually significantly increased, which is in conflict of the goal of decreasing PDN impedance at all frequencies. But in the case proposed by the authors, it's done in a controlled manner - the PDN impedance is intentionally increased by carefully creating a flat, slightly higher impedance region around 100 MHz to dampen unwanted oscillation (with simulations and measurements) when it's inadvertently excited.

Also, sometimes changing the timings or operating frequencies is required, so the impedance peak of the chip PDN is not excited, even when it means the end-user must redesign firmware, microcode, or HDL code.

dboreham · on Aug 15, 2020

I never saw bypass caps on chip, in the 80s when I was looking at chips.

ncmncm · on Aug 15, 2020

It seems noteworthy that clock, Vdd, and Vss trees do not penetrate the microcode array.

kens · on Aug 15, 2020

Note 1 to be specific :-) "The microcode ROM forms a large region with no power connections, just ground. This is because each row in the ROM is implemented as a very large NOR gate with the power pull-up on the right-hand edge. Thus, the ROM gates all have power and ground, even though it looks like the ROM lacks power connections"

The outputs from the microcode go into clocked latches to the left of the array.

The high-level motivation is that you want the microcode ROM to be as dense as possible, so you want to minimize the number of different signals going in there. It is constructed with just the input lines, transistors (or gaps), ground, and the output lines, so it is about as dense as possible. Even so, it takes up a large chunk of the 8086 die.

cheerlessbog · on Aug 15, 2020

Is it conceivable that the clock signal could supply the power in a CPU?

segfaultbuserr · on Aug 15, 2020

A 8086 processor requires approximately a maximum power of 1.8 watts, it means if you want to power the CPU from the power of the clock, the clock signal generator needs to be a radio transmitter with an output power of 32.5 dBm, not including insertion loss and power conversion loss at the load. This is beyond the maximum transmitting power of Wi-Fi allowed by the FCC. And it's a square wave clock, a 1 MHz clock can contain frequency components up to 100 MHz (or 1 GHz in modern digital logic), blasting out RF garbage everywhere. Just imagine the electromagnetic interference.

A typical crystal oscillator has a drive level of less than one miliwatt, 3 orders of magnitude lower.

But in principle, yes, I don't see why it's impossible.

kens · on Aug 15, 2020

I'm not sure why you would want to power a CPU through the clock. However, microcontrollers can get inadvertently powered through the I/O pin protection diodes if you don't hook up power and ground: https://www.eevblog.com/2015/12/18/eevblog-831-power-a-micro...

segfaultbuserr · on Aug 15, 2020

I once spent two days debugging a microcontroller circuit for my retrocomputing project because of this problem.

The microcontroller was an Atmel ATmega328P I extracted from an Arduino board, I used the chip as an external debugger and monitor. The first function I implemented was EEPROM reprogramming via a programming socket, the code was working without any sign of issue for two weeks, working as intended. I could burn a program to the ROM, plug the ROM in the 8-bit computer and execute instructions. Later I attempted to plug the microcontroller into the system bus of my 8-bit computer to add in-system reprogramming and memory debugging capabilities, so I don't have to plug and unplug the ROM every time I need to reprogram it. I planned to implemenet it by taking over the system bus of via DMA, so the running CPU would handover its the control to me. However, no matter how I changed the code, there were always some strange bugs. RAM reading and writing never worked reliably, there's random memory corruption, and the CPU was never able to continue executing the program correctly after the DMA had finished. It seems there were always some forms of bus contention bugs in the microcontroller, as if the GPIOs pins were not properly tristated/isolated before the beginning of a DMA cycle. But I was unable to find it at all in the code.

Eventually I realized the pin 7, Vcc, was not connected! I miswired the power to an I/O pin. From the beginning, the ATmega328P was operating without the main digital power supply and was sourcing all the power via the ESD diode on the I/O pin, and/or possibly the analog power supply AVCC. I was surprised that it was able to work for two weeks. On second thought, during EEPROM programming, the connection to the chip was direct, and tristate was mostly not used, the MCU had no problem driving it. But in memory debugging, the bus is long and the entire output was tristating on and off during a DMA cycle, the lack of a proper Vcc supply probably made the I/O driver to malfunction, especially the input/output selection, creating unpredictable output state.

Later on in another unrelated project, I encountered another problem due to an incorrect PCB footprint pinout. The 4-pin SMD crystal was connected to the wrong pins - only one side was connected to the chip. there's basically no system crystal at all. But the parasitic capacitance between the crystal pins was sufficient to start a weak oscillation (on the oscilloscope you can see a clock waveform with a very low amplitude, it's not at the proper logic level, as if it's an analog RF circuit), the chip was even able to start its 125 MHz PLL! But the logic was not fully functional until I dead-bugged soldering the crystal the correct way.

Lesson learned: Always double check. Just because the chip has power, doesn't mean the chip is receiving power correctly. Just because the chip has a clock output, doesn't mean the chip is receiving the clock correctly. And finally, if there's an external power-on reset, just because the chip was initialized after you apply power, doesn't mean the power-on reset circuit is functional.

Chouhada · on Aug 16, 2020

Reminds me of the history of the ARM1 chip, where the chip was accidently running off leakage alone [1]

[1] https://en.wikichip.org/wiki/acorn/microarchitectures/arm1

ngcc_hk · on Aug 15, 2020

What a surprise read.