68000 Tricks and Traps – Some assembly language programming guidelines (1986)

AnimalMuppet · on May 26, 2017

Note: 1986

Ah, what wonderful chips the 68000 series were, at least for the era. So many registers, such a nice clean architecture. Good times.

Steve44 · on May 27, 2017

I really loved programming it. The Z80 was nice and concise but quite restrictive, 68k really opened up with addressing modes and registers. I've had a quick scan through that page and recognise most of them from the dim distant past.

One thing I didn't see mentioned is it had a small instruction pipeline so we could squeeze better performance out of it by by careful instruction sequence. For example you wouldn't load a register and then use in in the next instruction because it wouldn't be ready and could stall. By what is effectively interleaving code where possible you could keep it running flat out. Every cycle matters!

vidarh · on May 27, 2017

The 68010 had a special mode to accelerate tight loops, but first the 68020 added a proper pipeline. The 68000 had neither.

Symmetry · on May 26, 2017

The idea of having separate data and address registers is a really interesting one. I've heard that the only pain point was not being able to take the difference between two addresses quickly.

pklausler · on May 26, 2017

FWIW, the CDC 6600 and Cray instruction set architectures also had this address vs. data distinction in their (scalar) register files. Both could do basic integer operations, but only the address registers could be used for address calculations in load/store instructions, while only the data registers could do floating-point.

(Strictly speaking, the CDC 6600 didn't have load/store instructions per se. Instead, any time you modified address registers A1-A5, a load to the corresponding data register X1-X5 took place from the resulting address. Modifications to A6-A7 caused a store from X6-X7 to the resulting address. It made for such tight code!)

ajross · on May 26, 2017

Not the only pain point. It made register assignment harder for the compiler.

Fundamentally you load from memory in generated code by adding two numbers together (e.g. the struct address and field offset, array address and index...). Motorola figured that you could just pick one number from each of two sets and thus save two bits (i.e. the source registers could be encoded with 3 bits instead of 4) in the encoding for the instruction.

As far as CISC tricks go, it wasn't too bad. But it's aged poorly: no one would design a ISA like that today, while Intel's elaborate addressing modes introduced with the 386 are producing code size and cache efficiency benefits to this very day.

Symmetry · on May 27, 2017

Another consideration that's just as important is in terms of physical design. By dividing one large register file into two smaller ones with fewer ports you should be able to substantially decrease the overall size and power usage of the register files. Probably the bypass network too.

I've never worked on compiler design, does it really make it that much more difficult? I'd imagine that the compiler would have a clear idea of whether a value is a memory address or not and could simply put addresses in the address registers and data in the data registers but it's likely I'm missing something.

tjalfi · on May 27, 2017

[0] has two old (2008) Usenet posts by an AMD CPU designer about these tradeoffs.

[0] http://yarchive.net/comp/register_file_size.html

vidarh · on May 27, 2017

Most of the alternatives at the time had fewer and more specialised registers - the architecture was childs-play to write compiler targets or compared to contemporary alternatives. My first compiler was for M68k, and it forever made me hate dealing with x86.

I'm not convinced it aged poorly. It failed because Motorola didn't have the resources to compete at keep up with Intel, and of course that could be down to the architecture making it harder.

But developments like [1] imply that this was more a problem with Motorola/Freescale's abilities to produce a sufficiently advanced design at the time - they're starting to beat Coldfire and PPC systems that are clocked far faster with the M68k instruction set on various benchmarks (though they are also adding instructions). With the caveat that this of course also tests things like memory bus speeds etc. What's clear, in any case is that the we never got to see what kind of performance it is possible to squeeze out of the M68k architecture.

[1] http://www.apollo-core.com/

puzzle · on May 26, 2017

Didn't the 68020 add even more addressing modes, plus scaling? Comparing the 68000 to the 386 doesn't seem too fair.

mikepavone · on May 26, 2017

Addressing modes on the 68020 are kind of crazy. In addition to some relatively straightforward improvements (scaling for the indexed mode, options for larger displacements on both the displacement mode and indexed mode) they also added something called "memory indirect" modes. These allowed you to dereference a pointer in memory in a single operation. In these modes you have a base register, a base displacement, an index register (with scale) and an outer displacement. The index register could be applied either to the base value or to the fetched "outer" value.

puzzle · on May 27, 2017

Yeah, that's exactly what I was thinking of and why I had to reread the original post twice. The 68020+ was perhaps crazier than the VAX. The last two variants you mentioned were called preindexed and postindexed mode. I wrote a few hundred thousands of lines of 68000 code, but much less for 68020+. Those were the days!

ajross · on May 27, 2017

For clarity: I mentioned the 386 addressing modes because fundamentally the ModRM encoding was designed to address the same code generation problem: efficiently encoding one instruction to compute base-plus-offset (-plus-immediate too) in addressing memory. This avoids having to compute an address first for what is one of the most common operations in application code. As it happened, Intel's trick was the better idea. Motorola's original register design was fine but not as good, the '020 madness didn't survive contact with the RISC pipeline.

mikepavone · on May 27, 2017

With the exception of index scaling (which as already mentioned, was added in the 68020) ModR/M and SIB is a strict subset of the 68000 addressing modes. I don't see what this has to do with the address register/data register split though. The 386 only had 8 GPRs and only 7 of those could be used as a base or index register. The reason for the address/data split is to allow 16 registers without needing 4-bit register fields.

Apart from the overly complex memory indirect modes, I'm having a hard time seeing how the 386 ModR/M and SIB setup is superior to what was in the 68020. Twice as many registers (though with usage restrictions), PC-relative addressing and a cleaner encoding. Those first two things have been fixed in x86-64 (and with generally fewer usage restrictions), but at the cost of making the encoding even worse.

jacquesm · on May 26, 2017

> while Intel's elaborate addressing modes introduced with the 386 are producing code size and cache efficiency benefits to this very day.

At the expense of orthogonality, which I think is a great feature for a CPU to have and which the 68K (and 6809 and 6800) had in spades.

yongjik · on May 26, 2017

Hmm, wasn't the 386 addressing modes pretty much orthogonal? (At least, compared to what 8086 and 80286 had.)

jacquesm · on May 26, 2017

Yes, but the instruction set wasn't. Orthogonality in an instruction set basically means that you know what basic instructions a processor supports, which addressing modes it supports and that allows you to create all possible combinations and they will just work as expected without gaps or strange insertions and if you look at the opcodes you'll be able to make sense of them.

vidarh · on May 27, 2017

SUBA.L will let you subtract a an address register from another (you can use any effective address as the source operand).

JKCalhoun · on May 26, 2017

I'm wondering why you would want to do that anyway?

mikepavone · on May 26, 2017

To make the encoding more compact while still having a large number of registers. A normal arithmetic instruction will have two 3-bit register fields, 1 3-bit effective address type field, and 3-bits that determine the size and direction (i.e. whether the effective address is the source or destination operand). Since an instruction word is 16-bits, this leaves 4-bits to select the operation type. This is a bit of a simplification as some instructions have more restrictions and move instructions are more expressive, but you get the idea.

Having a flat 16 register file would require two extra register bits in most instructions which would either halve the number of possible operations or require bits to be removed elsewhere. Since the operations required on pointers is typically much smaller than those required on non-pointer integers, this setup makes a certain amount of sense. The big downside is that as ajross points out, it complicates register allocation in a way that just about no other "mainstream" architecture does.

Other tradeoffs (larger instruction word, fewer addressing modes on "normal" instructions, requiring an extension word for more addressing modes, etc.) could have produced a cleaner architecture at the expense of code density. Code density was a big deal in the late 70's when the 68000 was designed due to the cost of memory.

EDIT: A flat 16 register file would actually quarter, not halve, the number of operation types since you lose 1-bit per register field.

adrianmonk · on May 27, 2017

Maybe they're asking why you'd want to subtract addresses. Unclear though.

JKCalhoun · on May 27, 2017

Sorry, yes, like the other poster said, I was wondering what you would want to subtract addresses.

mikepavone · on May 29, 2017

Say you have a pointer to the start of an array and a pointer to a certain element and you want to turn that into an index. Not super common, but not a crazy special case either. I'm not sure why the other poster thought this was a limitation on the 68000 though as subtraction is one of the few arithmetic operations you can perform on address registers. Perhaps he/she confused the limitations of the AU (address unit) with limitations on address registers? The AU is used for calculating effective addresses and all it can do is add; however, normal add and sub instructions use the ALU even when working exclusively with address registers.

There are a bunch of other operations that you can't do on address registers (or at the very least, an address register can only be used for the effective address operand). This can be inconvenient if your code needs a bunch of registers, but not very many pointers. Not a huge deal otherwise.

vidarh · on May 27, 2017

To calculate the size of a structure, or the offset of a substructure.

But you can do that on the M68k:

    SUBA.L <ea>,An

<ea> can be any addressing mode, including another address register. Most assemblers will let you use SUB.L and substitute SUBA if the destination is an address register.

ENTP · on May 27, 2017

68k, Atari ST, DevPac. Some of the funnest coding ever. When i discovered x86 ~10 years later I was horrified at how shit it was.

znpy · on May 26, 2017

Easy68k. I have hated that program with passion during the computer architecture course.

Didn't know how and why, but various conditional jumps used to plain fail. Conditions were met, but the jmp/je/jne/jz would not be executed.

That being said... The m68k... I was saddened to discover thst coldfire mcus had been discontinued, it would have been nice to play with an m68k-based board.

RJIb8RBYxzAMX9u · on May 27, 2017

ColdFire development boards by Freescale / NXP are still readily available, and relatively inexpensive. Not that I would recommend you start any new projects based on CF, but for the cost of ~3 RasPis, you get buy an evaluation board from DigiKey, and play to your heart's content.

Gracana · on May 27, 2017

If you're looking for a nice CISC architecture to replace m68k, I recommend the Renesas RX family. There's a lot of options, availability is good, and the ISA is assembly-pogrammer-friendly.

https://www.renesas.com/en-us/doc/products/mpumcu/doc/rx_fam...

Taniwha · on May 28, 2017

Yes it totally time for an assemply language pogrom

kevin_thibedeau · on May 27, 2017

You can implement a 68k in an FPGA with plenty of room to spare.

freedomben · on May 26, 2017

This tickles my nostalgia from the days of TI Calculator ASM hacking

alxmdev · on May 27, 2017

Indeed! I started with C by following the TIGCC TI-89 tutorials from http://www.technoplaza.net/ long ago, and since I got an Arduboy I've been itching to dive back into calc stuff and try some small assembly gamedev. So many little projects worth doing and so little time!

rangibaby · on May 26, 2017

> Trick and Traps

MAP08 name inspiration? It's old enough

TheRealPomax · on May 26, 2017

I feel cheated, there were way fewer than 68000 tricks and traps in this article.