The Z-80 has a 4-bit ALU. Here's how it works.

pslam · on Sept 6, 2013

CPUs of this era were normally multi-cycle for every instruction, but I never expected in the Z-80 at least one cycle was because the ALU was only 4 bit. Love the detailed analysis - and this is just the tip of the iceberg of that site.

One thing I'm missing from this article is an approximate gate count. Obviously going 4 bit was motivated by gate and area saving, but halving the ALU size isn't going to halve the gate count or area, because it still needs the same width bus and extra latches for the partial answer. Or was it critical path? What kind of saving was it from an 8 bit ALU?

kens · on Sept 6, 2013

I don't have a gate count (yet). You're right that the 4-bit ALU doesn't save a lot of space overall. The Z-80 designer talks a bit about the 4-bit ALU [1] but doesn't really explain the motivation. My guess was he was able to use two cycles for the ALU without increasing the overall cycle count because memory cycles were the bottleneck. If you can cut the ALU in half "for free", why not? Hopefully as I continue analyzing the chip this will become clearer.

[1] See page 10 in http://archive.computerhistory.org/resources/access/text/Ora...

Note: if you're interested in Z-80 architecture, you seriously should read that link.

ChuckMcM · on Sept 6, 2013

One of the reasons I was told was that the circuit extended to 16 bits easily (and was later used in the Z8000 as I recall) and doing decimal (BCD) math was easier. DAA (decimal adjust accumulator) was driven by the half carry flag. In '85 Intel wrote a Z80 emulator in 8086 machine code to try to win some Japanese game console design win and the decimal arithmetic stuff[1] was a PITA (and as it turned out not used a lot in games :-)

[1] The 8080 also had these decimal arithmetic hacks but it didn't have an alternate set of registers to pull from.

kens · on Sept 6, 2013

Thanks for the interesting information. I'm skeptical that the Z-80 designers were planning ahead for 16 bits, though. Simpler BCD math is a possibility - I'll look into this as I examine the Z-80 more. The 6502 wins, though, for crazy but efficient decimal arithmetic - it has a complex patented circuit that detects decimal carry in parallel with the addition/subtraction, and another circuit to add the correction factor to the result without going through the ALU again. So you don't need a separate DAA instruction or additional cycles for decimal correction.

General question: what things about the Z-80 would you guys like me to write about? Any particular features of the chip? Register-level architecture, gates, or the silicon? Analyzing instructions do cycle by cycle? Gate counts by category? Comparison with other microprocessors?

gp2000 · on Sept 6, 2013

Would love any and all analysis, but most interesting to me would be instruction details and especially the undocumented side effects. I'd also like to see comparison with the 8080 and how Zilog improved/changed the design.

jloughry · on Sept 6, 2013

...what things about the Z-80 would you guys like me to write about?

Undocumented instructions! The MOS 6502 had plenty of these and I understand the Z-80 did too.

jrabone · on Sept 6, 2013

Whether to provide BCD optimisation always seemed to be a tricky engineering decision; virtually nobody used the 6502 BCD instructions in the amateur home microcomputer environment I was familiar with in the 80s, but it was clearly considered to be important to the CPU manufacturers. Were there BCD benchmarks back then? Was it considered a killer feature to make financial software easier to write? Did Rockwell ever capitalise on that patent?

pwg · on Sept 6, 2013

The Atari's ROM's contained a full (well, for the time) floating point library implementation that used BCD floating point values.

The result was that the Atari's, without even trying, had more accurate decimal math algorithms than other contemporary computers. Something to do on the demo machines of the day in stores was to run this loop:

   10 let x = 100
   20 print x
   30 let x = x - 0.01
   40 goto 20

On an Atari this would accurately count down from 100 to zero with zero round off errors. The exact same loop on an IBM PC after about 5 steps started printing things like 99.94999999998 instead of 99.95.

Edit: formatting

rbanffy · on Sept 7, 2013

I got some interesting results. MSX and Atari computed the results correctly. On the TRS-80 Model I, wrong results started on the 12th iteration. Apple IIe (AppleSoft), VIC-20 and PET started the wrong results on the 8th or so. This has to do with the internal representation of floating-point numbers, of course - the Apple II uses, IIRC, 5 bytes to represent a float while MSX uses, again, IIRC (it's been a long time) 8.

to3m · on Sept 6, 2013

I have no idea what people used BCD for either. I vaguely recall reading that the C64's interrupt routine didn't even bother to clear the D flag, so you had to disable interrupts while using decimal mode! - so obviously most people just weren't expected to be using it.

I only ever saw it used for game scores... and the following, which prints a byte as hex, and is a neat example of cute 6502 code. Saves a few bytes over having a table of hex digits, and you don't need to save X or Y.

    HEX:  PHA
          LSR:LSR:LSR:LSR
          JSR HEX2
          PLA
          AND #15
    HEX2: CLC
          SED:ADC #$90:ADC #$40:CLD
          JMP PUTCH

(PUTCH takes an ASCII character in A.)

The 68000 had BCD as well. Never used it and don't recall ever seeing it used. I think they only included it so they could have an instruction called ABCD.

mpyne · on Sept 7, 2013

I would imagine BCD was useful as a bootstrap for a poor ASM programmer's bignum library (especially when 'bignum' was >16 bits).

Also would be useful for 7-segment LED displays.

pcwalton · on Sept 7, 2013

SNES games used it a lot for storage of things that need to be displayed on screen, such as score and lives and whatnot. If the counter is checked relatively infrequently, the reduced integer range and hassle of switching to and from BCD mode are a lot better than having to divide by ten repeatedly each frame, which is relatively slow.

csense · on Sept 6, 2013

It's interesting that the parent comment came up in the context of the chip used in TI calculators. I know the TI-83 series floating point format is BCD [1], but I'm not sure off the top of my head whether the built-in floating-point library actually uses these CPU instructions.

[1] (PDF link) http://education.ti.com/guidebooks/sdk/83p/sdk83pguide.pdf see pages 22-23

csense · on Sept 6, 2013

In x86-world, floating point hardware was an add-on chip before the 486DX was introduced in 1989 [1] [2].

I think the BCD instructions were never intended to be used outside of software arithmetic libraries, but they provide speedups for crucial operations in such libraries. Sort of like Intel's recently introduced AES instructions, which will probably only be used in encryption libraries.

Of course, it turns out that BCD-based arithmetic isn't much used, because IEEE-style floating-point has a fundamental advantage (you can store more precision in a given amount of space) and is also compatible with hardware FPU's.

[1] http://en.wikipedia.org/wiki/Floating-point_unit#Add-on_FPUs

[2] http://en.wikipedia.org/wiki/I486

gp2000 · on Sept 6, 2013

I'd guess this goes back to the 4004 which was designed for a desktop calculator. Easy BCD really helps those applications so they must have had that in mind as a target market. There's not much point in using BCD once reasonable amounts of RAM and ROM are available.

kps · on Sept 7, 2013

Except the Z80 / 80xx don't descend from the 4004, they descend from the Datapoint 2200. The 8008 didn't have BCD instructions or a half-carry flag, but it had a parity flag.

gp2000 · on Sept 7, 2013

Not architecturally, but Federico Faggin and Masatoshi Shima were the key people on the 4004 and 8080 before leaving to form Zilog and build the Z-80. The Z-80 had to have DAA (decimal adjust) to be compatible with the 8080. Possibly the 8080 had DAA to compare well against the 6800. If that's the case, then we must ask where the 6800 got the idea. Could be from minicomputers or even mainframes, but from what I've read the early microcomputer designers had no pretense of making processors to compete anywhere near the high end. Instead their sights were set more along the line of embedded systems. Desktop calculators fit into that and Shima himself designed desktop calculators and helped specify the 4004 before he came to Intel. Thus my speculation that the impetus could have come from that direction.

DonGateley · on Sept 7, 2013

I think it started life as a 4 bit processor to compete with the 4004 and went out the door as 8 after the 8080 so they just muxed what was already there.

I don't know that, I just think that. :-)

rwmj · on Sept 6, 2013

Indeed .. in all the years I spent programming the Z80, I never for a minute suspected it had this ALU architecture. It's not even mentioned in Rodney Zaks's great work.

rbanffy · on Sept 6, 2013

I remember the Z80 felt distinctively more sluggish than the 6502 (I had an Apple II with a Z-80 Softcard in it so it could run CP/M). Now I know why.

billforsternz · on Sept 7, 2013

No you don't, this was a clever optimization not a performance degrading hack. It was possible to save half the ALU transistors "for free" so the designer did. The for free bit is important. The Z80 ran a superset of the 8085 instruction set at equal or greater speed, but the 8085 had an 8 bit ALU.

rbanffy · on Sept 7, 2013

The minimum time of an instruction to execute on the 6502 was 2 clock cycles. The maximum is 7, IIRC. On the Z80, it's 4, with the maximum being about 30.

This has, of course, little to do with the width of the ALU.

rayiner · on Sept 6, 2013

The Pentium 4 also had a 16-bit ALU, which computed a 16-bit operation on each of the rising and falling edges of the clock to maintain 1-cycle latency. www.cs.virginia.edu/~mc2zk/cs451/mco_P4.ppt‎.

stephengillie · on Sept 6, 2013

The requested URL /~mc2zk/cs451/mco_P4.pptâ€Ž was not found on this server.

I don't know who is mangling the URL (Chrome, Apache, MITM?) nor why it's happening.

kens · on Sept 6, 2013

There's an invisible Unicode U+200E left-to-right mark at the end of the URL, probably picked up when the parent cut-and-paste the URL into HN. This in UTF-8 is E2 80 8E, which gets misinterpreted by the server as Windows-1252 character set: E2 = â, 80 = €, 8e = Ž. (It could be iso-8859-1, except that doesn't include €.) Interestingly, Chrome's DOM inspector shows this character as the HTML entity &lrm; while view-source has it as the actual invisible character.

I think the poster of the URL originally mangled it, but it would be nice if the HN software filtered out invisible characters from URLs. There's not much the destination server can do about it.

(Yes, I've dealt with too many character set issues in the past.)

rayiner · on Sept 6, 2013

http://www.cs.virginia.edu/~mc2zk/cs451/mco_P4.ppt

Does that work?

I copied and pasted it from a Google search result (because otherwise the file downloads without showing me the URL). Google of course has decided "copy link to" shouldn't work.

cpach · on Sept 7, 2013

I’ve noticed this issue a lot of times. Is Google deliberately inserting Unicode crap at the end of URL:s in order to prevent copy-and-paste from their search results? Or perhaps it’s to prevent scraping, and the loss of copy and paste functionality is just collateral damage.

nitrogen · on Sept 6, 2013

Google of course has decided "copy link to" shouldn't work.

There's an addon for that.

mpyne · on Sept 7, 2013

It shouldn't need an addon though. I'm often on Google's side on debates but this is just ridiculous.

stephengillie · on Sept 9, 2013

We should make our own browser where we can make those decisions!

rayiner · on Sept 7, 2013

Addon to what?

nitrogen · on Sept 7, 2013

An addon to Firefox (mobile, too) and Chrome that prevents Google from changing the link address to a tracking address, allowing you to right click a search result, then choose Copy link address, and get the correct result.

I agree with mpyne, though, that it shouldn't be necessary.

ashmud · on Sept 6, 2013

There's a U+200E character between "www.cs.virginia.edu/~mc2zk/cs451/mco_P4.ppt" and "."

acegopher · on Sept 6, 2013

This is awesome. I'm going through the Elements of Computing Systems book/course (a.k.a. From NAND to Tetris) http://www.nand2tetris.org/ and it's been great in helping me understand how CPU's are constructed.

The course actually has you make a ALU from logic gates, so you understand at a deep level just how it's done.

kens · on Sept 6, 2013

That looks like a great course. I find it interesting how real processors mostly use the same principles you learn in school, but then they throw in clever tricks and optimizations that you never learn about. And every processor I've looked at (6502, 8085, Z-80) has its own style.

jdmichal · on Sept 7, 2013

This is also very useful for anyone who's ever worked with a Gameboy emulator. It uses a processor very similar to the Z-80, including this particular ALU setup.

xxpor · on Sept 7, 2013

I was under the impression that the GameBoy used THE Z-80. Is that not the case?

pdw · on Sept 7, 2013

No, the Game Boy CPU was a custom design. http://realboyemulator.wordpress.com/2013/01/02/the-nintendo...

beachstartup · on Sept 6, 2013

in high school i learned z80 assembly to hack games on my ti-86 calculator. it's a great chip to learn on.

i remember there was a cross-compiler and a software utility + serial cable ... after some googling:

http://www.ticalc.org/programming/columns/86-asm/el-helw/les...