What is the PDP-11 instruction set?

pwagland · on Oct 18, 2020

The funny thing about the PDP-11 instruction set is how ridiculously simple it is to write a C compiler for it… the C operands map almost 1 to 1 to PDP-11 instructions.

At least that is what my compilers writing class taught me 25 years ago ;-)

kencausey · on Oct 18, 2020

Probably you are well aware of this, but arguably the causation runs the other way: https://en.wikipedia.org/wiki/C_(programming_language)#Histo...

hazeii · on Oct 18, 2020

As a PDP-11 assembly programmer who migrated to C, this one has caught me out too because it does 'feel' a very natural explanation. Possibly it's because both products came from the ideas that were in circulation at the time - or maybe it's that PDP-11 assembly programmers who moved to C found it easy (given pointers with pre/post-increment/decrement addressing is baked into the instruction set).

kbob · on Oct 18, 2020

To this day, I have a subconscious desire to use postfix increment and prefix decrement because of the PDP-11's addressing modes. "p++, --p"

simias · on Oct 18, 2020

I'm not really seeing it, at least not in this opcode list.

As far as I can tell the C operands are fairly simple and basic (add, subtract, complement, and, or, xor etc...). They would almost always map to a single instruction in MIPS, ARM32, ARM64, Z80 etc... So I'm not convinced that there's a direct correlation here, one way or the other.

Conversely if C was really deeply influenced by the PDP-11 ISA then why isn't there a single "bit set", "bit clear" or "rotate right" operator in C? Those require multiple operators in C but a single opcode here. Rotation in particular is a bit of a pain to code in C since you need to know the size of the integer you're rotating. I always thought that it was an oversight not to have built-in support for rotations in C.

The only thing that really stands out as being fairly C-ish is the auto-incrementing and -decrementing addressing modes which map well to p++ and friends (although ++p needs two opcodes as far as I can tell) but it's a feature that's available on other ISAs.

zwegner · on Oct 18, 2020

Just curious: is there a use case for rotations where the integer size isn't important? It's perhaps a bit annoying to need size-specific rotation functions, but all the uses of rotations I know of need specific integer sizes anyways.

simias · on Oct 19, 2020

You're right of course, I just mean that it makes it a bit verbose and error-prone to write in full "(i >> (32 - shift)) | (i << shift))". I need to triple check that I'm not off-by-one and that all the operators are in the right direction every time I write this. If you had rotation operators like, say, "<|" you could write "i <| shift" instead and that'd be a lot nicer to read and write IMO.

inkyoto · on Oct 18, 2020

You would need to look into the addressing modes at length.

* --i = * j++ is directly a «mov @(r0)+, -(r1)» (provided * j is in r0 and * i is in r1) whereas * i++ = * j-- is a «mov @-(r0),@(r1)+»; i[4] = 1 is «mov #1, @4(r1)» and * (int * )040 |= 01 is «bis 01, @#040». A function call via a function pointer (that is, say, stored in r2) is a «jsr pc, @(r2)» etc etc.

Most of the C pointer operations map onto single PDP-11 instructions and can be used even with the program counter register (which is r7).

Alex63 · on Oct 18, 2020

That takes me back. My first job as a programmer after university was on a PDP-11/78, on RSTS/E. I don't miss overlays!

hazeii · on Oct 18, 2020

Overlays on 8" floppies, the whirr-kerchunk of retries has left a scar across my soul.

randyshively · on Oct 18, 2020

This brings back great memories.

UCSD Pascal, TECO/KED, learning Dibol.... I learned on an 11/23 running RT-11, VT100, dual RL02s (oh, yeah). I never did learn anything serious about the O/S.

Our high school had an ancient PDP-8 we used DEC Basic on.

hazeii · on Oct 18, 2020

KED on a VT100 under RT-11 was awesome (compared to everyhing else at the time). The right-hand finger-dance across PF1-PF4 felt so natural, and despite it being at 9600 baud it never felt slow.

cptnapalm · on Oct 18, 2020

Awhile back, I was playing with 2.11BSD for the PDP-11 in simh and was trying to learn PDP-11 assembly. I would read things about overlays, but I could never find a resource to learn more about them. Were there any manuals or textbooks which gave good coverage to overlays?

hazeii · on Oct 19, 2020

Overlays (at least in RT-11) are just chunks of code that would load at a specific address in memory. So if your program didn't fit into memory, a good chunk of it could run in an overlay area.

So for example your code could have an always-resident portion say from 0 to 100000 octal (the lowest half of memory) and an overlay area from 100000 to 140000. We had a program that has an input, calculation and runtime phase so it suited the overlay model; programs that needed all their code in memory at the same time by comparison would not be suited to overlays (especially on 8" floppies, where loading an overlay was accompanied by plenty of mechanical noise).

jerryr · on Oct 18, 2020

There are a lot of things I don't miss! x86 segmented architecture, for example. Even modern embedded programming on an ARM Cortex-M feels luxurious compared to not that long ago.

anticristi · on Oct 18, 2020

It's almost embarrassing to tell young graduates about unreal mode and triple fault.

pjmlp · on Oct 19, 2020

Yep, which is somehow funny to see people discussing what you can or cannot do on ESP286 and ESP32, when compared with CP/M and MS-DOS early hardware, they are server class.

pmiller2 · on Oct 18, 2020

It could always be worse. As I remember it, MS-DOS also had overlays....

pjmlp · on Oct 19, 2020

Yep, Turbo Pascal 3.0 supported them.

And you could also use tricks like TSR for simulating multiprocessing.

gumby · on Oct 18, 2020

These early machines were not microcoded -- the instructions were hard coded, which tended to make the instruction sets fare more regular.

* Though later PDP-11s like the LSI-11 did have microcode -- AKA, in the argot of the time, a "writable control store"

aap_ · on Oct 18, 2020

Only the first one (PDP-11/20) wasn't mircocoded. all others were.

hazeii · on Oct 18, 2020

The KEF-11 for the LSI11-23 was optional, though best bought at the same time. Given the price of the chip, plugging one in tended to be a bit of a ceremonial affair.

getpost · on Oct 18, 2020

Ah, my first love! The first assignment was to hand-translate a program to do keyboard input into machine language, and then use the console switches to enter the machine language, so I could then type in more machine language. After a couple small programs, I got to use the assembler! It was thrilling! No more hand translation! I wish CS was still taught this way.

mov -(pc),-(pc)

drudru · on Oct 18, 2020

What does that mov do?

hazeii · on Oct 19, 2020

The PC (R7 is the program counter on PDP-11's) would have been auto-incremented to point to the next word in memory, so the first -(PC) pre-decrements the PC with the result it points back to the instruction in memory. The contents of this location (the instruction itself) is read, and now the second -(PC) comes into play as the destination. The PC is again pre-decremented (so it now points to the word before the instruction) and the source operand (the instruction) is stored in this location.

In other words, the net effect is the instruction has been copied down 1 word in memory. Finally, because it's the PC that's in use and it's been decremented to point to the new copy of the instruction, the copy of the instruction is fetched and the operation repeats. So the instruction copies itself down in memory, until all memory below the starting point is filled with 014747; what happens next depends on the particular system, but generally not useful (the instruction was often used to check memory; key it in at the highest address, run it and then inspect memory to see if it has the same value everywhere).

As an aside, in octal this translates to 014747; 01 is the MOV, 4 is the pre-decrement mode and 7 indicates R7. Thus an assembler is barely needed on a PDP-11, because knowing the opcodes for the common instructions and the addressing modes makes it trivial to convert an assembly instruction like MOV #123,@#1000 into the octal 012737 000123 001000.

The bootstrap routines are often only a few lines of assembler, thus it was easy to remember, translate to octal on the fly and key them into the front panel switches with a few well-practiced sweeps across the toggle keys.

khaledh · on Oct 18, 2020

If you're interested in the original PDP-11 handbook, it's available on bitsavers.org: http://bitsavers.org/pdf/dec/pdp11/1120/PDP-11_Handbook_Seco...

happycube · on Oct 18, 2020

Ya know, it looks completely feasible to convert the basic design into a 64-bit system with 32-bit instructions.

mikequinlan · on Oct 18, 2020

Is somebody going to do this for the PDP-10? My first programming experience was on the PDP-10 and I loved how symmetrical the instruction set was. Of course a maximum user address space of 256K 36-bit words would never fly nowadays, but for the time it seemed awesome.

codesmythe · on Oct 18, 2020

Checkout http://pdp10.nocrew.org/docs/instruction-set/pdp-10.html for detailed info on the PDP-10 instruction set

anthk · on Oct 19, 2020

Get the Panda Tops-20 distribution for the Ka10.

Then find either an assembler or use the debugger.

pinewurst · on Oct 19, 2020

The final KL10B permitted up to 32 256K word sections.

mikequinlan · on Oct 19, 2020

I am curious, do you know if user-mode programs had access to more than a single 256K word address space? Did they add something like segment registers or a split between Data and Instruction space?

pinewurst · on Oct 19, 2020

The answer is yes they did. I think the PC was extended and you switched explicitly between segments via a small set of instructions, so code/data can't span segments. I think of it as being like an overlay system without actually overlaying.

Sniffnoy · on Oct 18, 2020

Given that this is a 16-bit computer (rather than, say, an 18-bit one), why are they describing things in octal rather than hex...?

hazeii · on Oct 18, 2020

3 bits for the 8 addressing modes, 3 bits for the register. For double operand instructions. that's 12 bits - so only 4 bits left over to encode double-operand instructions.

The most important of which was probably MOV (0001, or 01 octal). Thus, something like the C 's++=t++' translates direction to a single 16-bit instruction:-

0001 011 001 011 002 = 012122

The cleverness though was that by using the addressing modes on the PC (R7) things like loading a constant into memory or moving memory contents was all single-instruction as well:-

e.g. MOV #7, @#1000 # Put '123' int address 1000 ==> MOV (PC)+,@(PC)+ ==> 012737 7 1000

The same trickery made relative addressing easy, made PUSH and POP nothing more than MOV instructions (though it did take me some time to understand the use of R5 for passing arguments in Fortran).

Someone · on Oct 18, 2020

Aside from what others said, the byte as an 8-bit entity wasn’t settled at the time.

At PDP, the PDP-5, PDO-8 and PDP-12 were 12-bits and the PDP-6 and PDP-10 were 36 bit, all of which are multiples of both 3 and 4

Memory was expensive at the time, so I would guess most of them had 6-bit character sets. That must have made octal a popular choice.

I think https://en.wikipedia.org/wiki/Hexadecimal#History_of_written... shows the notation for hex wasn’t settled upon. I think that’s an indication of its rarity (go read it to learn which version was deemed “ridiculous” :-) )

aap_ · on Oct 18, 2020

Because with 8 registers and 8 addressing modes octal makes for extremely readable machine code. There's another completely distinct proposal for the PDP-11 architecture which uses hex but it was never built.

Sniffnoy · on Oct 18, 2020

Oh, good point -- the instructions break down nicely as one leading bit plus 5 octal digits. Branches are a little bit of an exception, but you're right that octal works overall better here. Still, that doesn't explain the use of octal at the top where it's not referring to instructions.

aap_ · on Oct 18, 2020

Using two different number bases sounds painful, especially considering hex wasn't very common at the time. Also physical unibus addresses are 18 bits so they divide nicely into octal digits.

elvis70 · on Oct 18, 2020

See also "The 80x86 is an Octal Machine": https://web.archive.org/web/20200114164700/http://www.dabo.d...

Taniwha · on Oct 18, 2020

Octal was actually a lot more common back then, people hadn't settled on hex as the default power-of-two base - just as they hadn't settled on power of 2 word sizes - pdp8 was 12-bit, pdp-6/dec-10/20 were 36 bit, the old B6700 I grew up with had 48/51-bit words - 6-bit bytes were not uncommon

pmiller2 · on Oct 18, 2020

Octal works a little better than hex when you actually have to toggle in a boot loader at the front panel, IMO. I think 8 switches are a little easier to manage than 16 for this use case.

Taniwha · on Oct 19, 2020

used to know the pdp11-05 bootstrap by heart .... I'm not sure it matters - I can imagine that 2 lots of 2x4 can be done with 2 hands in 2 operations would probably be faster ... but it's kind of moot because the front panel was marked in octal so that's the way one learned it

I think that standardising on 8-bit bytes and power of 2 words is what pushed us to hex as a standard notation

rswail · on Oct 19, 2020

Because it's easier to program with three fingers on the switches than four :)

But it was because there were 8 registers and 8 addressing modes, so a two operand instruction used 6 bits for each, leaving 4 bits for the instruction.