C Portability Lessons from Weird Machines

zokier · on Feb 21, 2022

I don't think there is anything wrong in writing platform-specific code; in certain circles there is this weird fetishitization of portability, placing it on the highest pedestal as a metric of quality. This happens in C programming and also for example in shell scripting, people advocating for relying only on POSIX-defined behavior. If a platform specific way of doing something works better in some use-case then there should be no shame in using that. What is important is that the code relies on well-defined behavior, and also that the platform assumptions/requirements are documented to a degree.

Of course it is wonderful that you can make programs that are indeed portable between this huge range of computers; just that not every program needs to do so.

jltsiren · on Feb 22, 2022

I work in bioinformatics, where the underlying technology changes often enough that you don't have to think too much about portability. If a new fundamentally different system emerges, my old code will probably be obsolete before I have to think about supporting it on the new system. I was around during the 32-bit/64-bit transition (which was painful), and I've been porting code targeting x86 to ARM (which is usually not). Here are some thoughts I've had about portability and system features:

Your code includes your build system and dependency management. If they are not portable, your code is not portable.

I'm a language descriptivist. A standard is an imperfect attempt at describing a language. When the compiler and the standard disagree, the compiler is right, because it decides what the code will actually do.

OpenMP is an essential part of C/C++. Compilers that don't support it can't compile code that claims to be C/C++.

There is no point in pretending that you support systems you don't regularly use. In my case, portability means supporting Linux and macOS, x86 and ARM, GCC and Clang, but not in all combinations.

Portability, simplicity, and performance form a triangle. You can choose two and lose the third, or you can make compromises between them.

Pointers are 64-bit. If they are not, you need a lot of workarounds for handling real data, because size_t is too small.

Computers are little-endian, because that enables all kinds of convenience features such as using data from memory-mapped files directly.

Compilers should warn you if you use integer types of platform-dependent width, unless the width is required to be the same as pointer width. And except as argc and related variables.

gnufx · on Feb 22, 2022

I do research computing support, and bioinformatics has long been thought in my circles the nightmare area to support (perhaps supplanted by machine learning these days).

To pick up some of that, compiler maintainers obviously disagree about the compiler always being right (which one?), and I'm baffled by the requirement for a compiler to support OpenMP (which version) to be considered C, especially if you're dealing with, say, embedded systems for bioinformatics data acquisition, for instance.

I've successfully claimed to support systems I'd never used in high-profile projects (particularly when the architecture and operating system landscape was rather more interesting); I couldn't just tell the structural biology users there was no point supporting their hardware. I currently support a GPU-centric POWER9 system, but I hadn't actually used POWER in the years I'd been building packages for it (and ARM).

jltsiren · on Feb 22, 2022

> To pick up some of that, compiler maintainers obviously disagree about the compiler always being right (which one?), and I'm baffled by the requirement for a compiler to support OpenMP (which version) to be considered C, especially if you're dealing with, say, embedded systems for bioinformatics data acquisition, for instance.

As I said, I'm a language descriptivist. The compiler I'm using right now is right, because it builds the binary. If another compiler has a different opinion, then I may have to deal with multiple dialects. And because the default compilers in many HPC clusters and Linux distributions don't get updated that often, the dialects may continue to be relevant for years.

The situation with OpenMP is similar. A compiler does not have to support OpenMP to be considered a C compiler but to be useful as a C compiler. The bioinformatics software I need to build often uses OpenMP. If a compiler can't build it, it's not doing a very good job of being a compiler. OpenMP versions are not particularly relevant, as the usage is usually very basic. The only version conflict I can remember is from the C++ side: some compilers didn't support parallel range-based for loops.

thesuperbigfrog · on Feb 21, 2022

>> in certain circles there is this weird fetishitization of portability, placing it on the highest pedestal as a metric of quality.

It's not a fetish if you have ever ported legacy code that was not written with potability in mind.

"ints are always 32-bit and can be used to store pointers with a simple cast" may have worked when the legacy program was written, but it sure makes porting it a pain.

DrScump · on Feb 22, 2022

>It's not a fetish if you have ever ported legacy code that was not written with portability in mind.

This.

You can code portably and take advantage of platform-specific features or performance advantages.

I used to work on a product line that supported dozens of different UNIX platforms (and, eventually, even DOS protected-mode). The entire build process was portable. We had a primary configuration file for code/libraries and one for server internals that provided for:

. shared memory configuration (if any) and other limits

. BSD sockets vs System V streams

. byte-swapped vs regular processors and little- vs big-endian

. 16- vs 32-bit processors (later 64 bit)

. RISC vs CISC

. vagaries of C compilers, normal flags to pass for optimization, gcc vs cc, static vs dynamic links, libraries etc

. INT / LONG bytecounts, floating point, etc.

. ... and many others.

The goal was to abstract everything possible and have platform-specific code limited to the configuration files. Then, porting becomes fine tuning the configuration files, adding platform specific code to the mainline (with appropriate DEFINES) only when absolutely necessary, test, finalize configuration, test again, post the final config files and any changes back to the mainline.

nyanpasu64 · on Feb 21, 2022

And `unsigned long` can store pointers just fine on Linux 32, Linux 64, Windows x86-32, and MSYS2 64... but not Windows x86-64. https://github.com/cc65/cc65/issues/1680#issuecomment-104641...

loeg · on Feb 22, 2022

32-bit 'long' on a 64-bit pointer platform is kind of a bizarre choice by Microsoft. Technically allowed by the C abstract machine, of course.

bombcar · on Feb 22, 2022

Arguably on a system where you can address 8 bit byte, 16 bit word, 32 bit dword, and 64 bit qword having char, int, long, and long long all be different sizes makes sense (and gives the programmer control) - especially if before they were assumed to be shorter.

A program with 32bit longs will probably work fine most of the time, and save memory while doing it (which can matter for cache hits/misses).

leetcrew · on Feb 22, 2022

int is the same width as long in msvc though. if the width of an integral type is remotely important to the correctness of your program, you should really be using stdint.h types to document that intention, regardless of whether you care about portability.

loeg · on Feb 22, 2022

int on Win64 is 32 bit, not 16.

kevin_thibedeau · on Feb 22, 2022

They screwed up by letting LONG into their OS structs when DWORD should have been good enough. It would be a huge clusterfuck if long was 64-bit while LONG stayed 32-bit.

andrewf · on Feb 22, 2022

A lot of early Win32 code was originally Win16 code - you could in theory have the same code target both. I think that encouraged everyone to favor long / DWORD when they wanted a 32-bit integer even once the 16-bit compilers were receding into the distance.

compiler-guy · on Feb 22, 2022

This is absolutely true, but doesn’t make the case that the original authors should have compromised their design or spent the extra effort to make the code portable.

If the spec was to make it work under certain conditions and not to worry about others, then the additional work for portability just adds cost.

jart · on Feb 22, 2022

It should work today. If we migrate to ILP64 data model then we can restore many of those elegant behaviors, because it makes C so easy, almost like a scripting language. As far as I can tell, the original motivations for the LP64 data model which causes int and int* to be different widths, was because SCO wanted it that way. They never published the measurements that led them to this conclusion. Are you willing to sacrifice language simplicity based on the word of the people who are famous for suing Linux users?

joosters · on Feb 22, 2022

The Open Group's study into LP64/ILP64 explains their choice fairly well. In particular, it mentions both Digital and SGI's prior experience with porting code to LP64. There's nothing about SCO's preferences in it.

http://archive.opengroup.org/public/tech/aspen/lp64_wp.htm

"By far, the largest body of existing code already modified for 64-bit environments runs on LP64 based platforms."

scaramanga · on Feb 22, 2022

The question is how much time is spent maintaining and testing for correctness a portable program in it's early iteration vs the time spent porting it for it's second lease of life.

Clearly if it it would have cost 2N dollars to obviate N dollars of work, then cost would not have been worth it. This can occur if either a) portability complicates things a lot or b) it complicates things a little, but over a very long timespan during which maintenance work is being done.

LeifCarrotson · on Feb 21, 2022

> If a platform specific way of doing something works better in some use-case then there should be no shame in using that.

I fully agree, but the real problem is not the limitation to platform-specific logic but the conflation of fundamental requirements and incidental requirements. There's no good way to know when you meant "int" as just a counter of a user's files, or "int" as in a 32-bit bitmask, or "int" as the value of a pointer. For the former, it probably doesn't matter if someone later compiles it for a different architecture, for the latter, if you mean int32_t or uintptr_t, use that!

dahfizz · on Feb 22, 2022

I strongly agree, if only for the readability benefits. Using uintptr_t vs uint64_t vs long long are all technically equivalent, but they convey information to the reader about your intent

xemdetia · on Feb 21, 2022

I would say most modern code is not the kind of code written with the absurd platform assumptions that old code used to do. There's not magic addresses you have to know to talk to hardware, there's no implicit memory segmentation/memory looping, and so on. Ever since most modern OSs try to prevent direct access or randomize address spaces and so on it just is hard to write code that is insane like the way we were.

So the question is are you contesting the POSIX defined behaviour over using more logical interfaces from the OS or the wild west where people were hacking around broken, platform-specific features and often broken or awkward system libraries or just the more modern case where people use the standard interface instead of a more performant one. In the latter case I agree I wish there was a more 'nice' way of doing more dynamic and efficient feature detection without making simple C programs crazy complex.

Someone · on Feb 21, 2022

I think it also helps that modern hardware is a lot less diverse. Most of the tools only run on systems that are little endian, where NULL is zero, chars are 8 bits, ints are two’s complement and silently wrap around, floats are 32 bits IEEE 754, etc, so code that erroneously assumes those to be true in theory isn’t portable, but in practice is.

And newer C standards code may even unbreak such code. Ints already are two’s complement in the latest version of C++ and will be in the next version of C, for example.

isomel · on Feb 22, 2022

> code that erroneously assumes those to be true in theory isn’t portable, but in practice is.

Be careful about undefined behaviour still. Even if your architecture have defined semantic, the compiler might still not assume them.

dahfizz · on Feb 22, 2022

> I would say most modern code is not the kind of code written with the absurd platform assumptions that old code used to do.

That is a pretty bold claim. I would be surprised if any nontrivial "modern" program would be bug-free on a platform with nonzero NULL, non two's compliment ints, big endian an non IEEE floats.

The assumptions we make about the platform just don't feel like assumptions anymore because every machine is the same nowadays.

bombcar · on Feb 21, 2022

C (and to some extent shell) programmers are the ones with the most experience of the machine under them changing, perhaps drastically - few other languages have even been around long enough for that to have happened.

Java sidesteps this, of course, by defining a JVM to run on and leaving the underlying details to the implementation.

ithinkso · on Feb 21, 2022

I love how the concept of 'platform independent' evolved - you would think that it means you can run it anywhere but almost all software that uses 'platform-independent' code is very platform-dependent

Because if you make an Electron app it's only logical that because it is platform-independent it can only be run on macOS

Filligree · on Feb 21, 2022

> Of course it is wonderful that you can make programs that are indeed portable between this huge range of computers; just that not every program needs to do so.

Isn't most code that would behave differently on different architectures subject to undefined behaviour, however? The signed overflow case mentioned, for example.

Sure, some of it is implementation-defined, but in practice you need to write ultra-portable code anyway in order for your compiler not to pull the rug out underneath you.

scaramanga · on Feb 22, 2022

The behaviour of signed integer overflow is undefined, but only if it actually overflows, and whether it overflows or not can be based on the widths of types which are implementation-defined, so there is a link between the two things.

Many "correct" programs rely on undefined behaviour for optimizations while the authors knowingly assert (or assume) that the undefined behaviour doesn't actually occur.

So the question is, if you are writing non-portable code for a specific environment, is a UB that actually triggers on your environment "non-portability" or is it a "bug"? I think most people would define that as being a bug.

Edit: Okay, bad example cos you could maybe use int_leastN_t or something in that example to document the assumptions... But you don't always have control over types, and there may be multiple constraints, etc..

adonovan · on Feb 22, 2022

Well said. It's possible to write shell scripts that are portable between Mac and Linux, but it takes years to learn all the places where the semantics differ. For example, there's no portable way to call sed with both -e and -i flags. In many cases it's better to say "this script assumes Linux" (and perhaps use Docker to simulate Linux) than to waste time supporting another platform.

Shorel · on Feb 22, 2022

From a technical point of view, we are in agreement.

The issue is when platform-specific code ties your business logic to a particular vendor API, and then this vendor milks your company for profit.

eqvinox · on Feb 21, 2022

On a slightly related note, chances are good anyone reading this has an 8051 within a few meters of them - they're close to omnipresent in USB chips, particularly hubs, storage bridges and keyboards / mice. The architecture is equally atrocious as the 6502.

btw: a good indicator is GCC support - AVR, also an 8-bit µC - is perfectly well supported by GCC. 8051 and 6502, you need something like SDCC [http://sdcc.sourceforge.net/]

jazzyjackson · on Feb 21, 2022

> The architecture is equally atrocious as the 6502.

I only ever hear glowing/nostalgic reviews of 6502 programming, I guess from the retro/8bit gaming scene, curious what you find so atrocious.

tenebrisalietum · on Feb 21, 2022

6502 is awesome to program from assembly.

What makes the 6502 atrocious for C is:

- internal CPU registers are 8 bits, no more, no less and you only have 3 of them (5 if you count the stack pointer and processor status register).

- fixed 8-bit stack pointer so things like automatic variables and pass-by-value can't take up a lot of space.

- things like "access memory location Z + contents of register A + this offset" aren't possible without a lot of instructions.

- no hardware divide or multiply.

Many CPUs have instructions that map neatly to C operations, but not 6502. With enough instructions C is hostable by any CPU (e.g. ZPU) but a lot of work is needed to do that on the 6502 and the real question is - will it fit in 16K, 32K, as most 6502 CPUs only have 16 address lines - meaning they only see 64K of addresses at once. Mechanisms exist to extend that but they are platform specific.

IMHO Z80 is better in this regard with it's 16-bit stack pointer and combinable index registers.

duskwuff · on Feb 22, 2022

> - fixed 8-bit stack pointer so things like automatic variables and pass-by-value can't take up a lot of space.

Also, the stack isn't addressable. The only stack operations the CPU natively supports are push and pop. If you want to access values anywhere else on the stack (say, if you're trying to spill a couple of local variables to the stack), you're going to have a bad time.

electroly · on Feb 22, 2022

I've never programmed on a 6502 before and have only read about it; isn't the stack addressable in the zero page?

duskwuff · on Feb 22, 2022

No, the stack is in the next page up ($0100 - $01ff). While it's accessible as memory, the 6502 doesn't provide any addressing modes specific to this page; accessing it involves multiple instructions, and requires the use of the X register as a temporary.

electroly · on Feb 23, 2022

Got it; thank you for the clarification! Someday I'll try my hand at some 6502 assembly.

adrian_b · on Feb 21, 2022

6502 was nice only in comparison with Intel 8080/8085, but it was very limited in comparison with better architectures.

The great quality of 6502 was that it allowed a very cheap implementation, resulting in a very low price.

A very large number of computers used 6502 because it was the cheapest, not because it was better than the alternatives.

For a very large number of programmers, 6502 was the 1st CPU whose assembly language they have learned and possibly the only one, as later they have used only high-level languages, so it is fondly remembered. That does not mean that it was really good.

I also have nostalgic happy memories about my first programs, which I have entered on punched cards. That does not mean that using punched cards is preferable to modern computers.

Programming a 6502 was tedious, because you had only 8-bit operations (even Intel 8080 had 16-bit additions, which simplified a lot multiply routines and many other computations) and you had only a small number of 8-bit registers with dedicated functions.

So most or all variables of normal sizes had to be stored in memory and almost everything that would require a single instruction in modern microcontrollers, e.g. ARM, required a long sequence of instructions on 6502. (e.g. a single 32-bit integer addition would have required a dozen instructions in the best case, for statically allocated variables, but up to 20 instructions or even more when run-time address computations were also required, for dynamically-allocated variables.)

A good macroassemmbler could simplify a lot the programming on a 6502, by writing all programs with a set of macroinstructions designed to simulate a more powerful CPU.

I do not know whether good macro-assemblers existed for 6502, as in those days I have used mostly CP/M computers, which had a good macro-assembler from Microsoft, or the much more powerful Motorola 6809. I have programmed 6502 only a couple of times, at some friends who had Commodore computers, and it was weak in comparison with more recent CPUs, e.g. Zilog Z80 (which appeared one year later than 6502).

mwcampbell · on Feb 21, 2022

> I do not know whether good macro-assemblers existed for 6502

They certainly did. I don't know about the communities that grew around Commodore, Atari, or other 6502-based computers, but in the Apple II world, there were multiple macro assemblers available. Possibly the most famous was Merlin. As a pre-teen, I used the Mindcraft Assembler. Mindcraft even sold another product called Macrosoft, which was a macro library for their assembler that tried to combine the speed of assembly language with a BASIC-like syntax. The major downside, compared to both hand-coded assembly and Applesoft BASIC (which was stored in a pre-tokenized binary format), was the size of the executable.

Edit: Speaking of simulating a more powerful CPU, Steve Wozniak implemented the SWEET16 [1] bytecode interpreter as part of his original Apple II ROM. Apple Pascal used p-code. And a more recent bytecode interpreter for the Apple II is PLASMA [2].

[1]: https://en.wikipedia.org/wiki/SWEET16

[2]: https://github.com/dschmenk/plasma

le-mark · on Feb 21, 2022

Wozniak’s sweet 16 was along these lines:

https://en.m.wikipedia.org/wiki/SWEET16

schlupa · on Feb 22, 2022

6502 is nice to program in assembler. For a compiler it is an atrocious platform to support. The limited stack, the 8 bit limit, the zero page idiosyncrasies, the page crossing bugs (some corrected in 65C02), etc. Z80, while also full of adhoc-isms is a much more pleasant target for a C compiler, only just for its 16 bit register and stack.

eqvinox · on Feb 21, 2022

> curious what you find so atrocious.

In the context of the original post, it's a bad target for C — I have no clue about other 6502 use :)

dfox · on Feb 21, 2022

One thing to keep in mind while programming AVR in C is that it still is small-ish MCU with different address spaces. This means that if you do not use correct compiler-specific type annotations for read-only data these will get copied into RAM on startup (both avr-libc and arduino contain some macrology that catches the obvious cases like some string constants, but you still need to keep this in mind).

garaetjjte · on Feb 21, 2022

Newer parts actually do have flash mapped into memory address space accessible through normal LD instruction, though I'm not sure if avr-libc moves constants into it automatically or still needs that PROGMEM/__flash annotation.

unwiredben · on Feb 21, 2022

For 6502 fans, there's a new port of Clang and LLVM that seems to be doing some nice code generation. See https://llvm-mos.org/

RicoElectrico · on Feb 21, 2022

Hope RISC-V will displace 8051 over time. It's such an absurd thing to extend this architecture in myriad non-interoperable (although backwards-compatible with OG 8051) ways. And don't forget about the XRAM nonsense. Yuck.

mmoskal · on Feb 22, 2022

AVR was designed much later with a C compiler in mind. 8051 while atrocious is much cheaper to license and fab than an ARM Cortex-M. These applications will probably go RISC-V in near future.

On the real low end PADAUK is king, at under 3 cent retail. They don't even bother with a C compiler and instead have a macro assembler on steroids called Mini-C (there is also sdcc but I never used that having needed every last 13 but word of ROM). Programming these, with 64 bytes of RAM, is strangely refreshing - no fancy abstractions, microsecond-precise instruction timings.

yuubi · on Feb 21, 2022

the 6502 has a single 16-bit address space with some parts (zero page, stack) addressable by means other than full 16-bit addresses. the 8051 has 16-bit read-only code space, 16-bit read/write external memory space, and 8-bit internal memory space, except half of it is special: if you use indirect access (address in an 8-bit register), you get memory. but if you encode that same address literally in an instruction, you get a peripheral register.

at least that's the part I remember

WalterBright · on Feb 21, 2022

I recall discussing C with a person using a processor where chars, shorts, and ints were all 32 bits. He stressed that writing portable code was necessary.

I pointed out that any programs that needed to manipulate byte data would require very special coding to work. Special enough to make it pointless to try to write portable code between it and a regular machine.

It's unreasonable to contort the standard to support such machines. It is reasonable for the compiler author on such machines to make some adjustments to the language.

For example, one can say C++ is technically portable to 16 bit machines. But it is not in practice, because:

1. segmented 16 bit machines require near/far pointer extensions

2. exceptions will not work on 16 bit machines, because supporting it consumes way too much memory

3. ditto for RTTI

rwmj · on Feb 21, 2022

C23 just dropped support for any non-twos-complement architectures. No more C on Unisys for you! http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf

eqvinox · on Feb 21, 2022

That doesn't preclude C23 on Unisys, it just forces the compiler to hide that fact from the programmer ;D

(SCNR)

bombcar · on Feb 22, 2022

Which will be wonderful for code compiled to C23 linked against C89 libraries, if that's even possible.

namibj · on Feb 22, 2022

One of the most widespread potential targets for non-two's-complement C would be JavaScript, or otherwise abusing floating point hardware for C's signed integers. Because their overflow is indeed weird (and breaks associativity).

mmoskal · on Feb 22, 2022

It looks like signed overflow will be still undefined in C23 (but defined in C++).

isomel · on Feb 22, 2022

Signed overflow is not defined in C++

titzer · on Feb 21, 2022

It's amazing the abilities that 50 years can bring a programming language. Longest, most painful design debate ever.

astrobe_ · on Feb 21, 2022

> Everyone who writes about programming the Intel 286 says what a pain its segmented memory architecture was

Actually this concerns more pre-80286 processors, since 80286 introduced virtual memory, and the segment registers were less prominent in "protected mode". Moreover I wouldn't say it was a pain, at least at the assembly level, once you understood the trick. C had not concept of segmented memory, so you had to tell the compiler which "memory model" it should use.

> One significant quirk is that the machine is very sensitive to data alignment.

I remembered from school time about a "barrel register" that allowed to remove this limitation, but it was introduced in 68020.

On the topic itself, I like to say that a program is portable if it has been ported once (likewise a module is reusable if it has been reused once). I remember porting a program from a 68K descendant to ARM, the only non-obvious portability issue was that in C, the char type is that the standard doesn't mandate the char type to be signed or unsigned (it's implementation-defined).

spc476 · on Feb 21, 2022

The segment registers were less prominent on the 80386 in protected mode since you also have paging, and each segment can be 4G in size. On the 80286 in protected mode the segment registers are still very much there (no paging, each segment is still limited to 64k).

bombcar · on Feb 22, 2022

Segments are awesome if they're much larger than available RAM - handling pointers as "base + offset" is often much easier to understand than just raw pointers.

shadowofneptune · on Feb 21, 2022

Having done some 8086 programming recently, I did find segments rather helpful once you get used to them. They make it easier to think about handling data in a modular fashion; a large (64k maximum) data structure can be given its own segment. The 286 went farther by providing protection to allocated segments. I have a feeling overlays only really become a nuisance once you start working on projects far bigger than were ever intended for that generation of '86. MS-DOS not having a true successor didn't help either.

zwieback · on Feb 21, 2022

> > Everyone who writes about programming the Intel 286 says what a pain its segmented memory architecture was

> Actually this concerns more pre-80286 processors, since 80286 introduced virtual memory,

86 had segments, 286 added protected mode, 386 added virtual. I would agree, though, 286 wasn't as bad as people make it sound. In OS/2 1.x it was quite usable.

tzs · on Feb 22, 2022

It was still a pain in protected mode because they botched the layout of the various fields in selectors. If they had made a trivial change they could have effectively made it trivial for the OS to give processes a linear address space.

Here's a summary of how it worked in protected mode for those not familiar with 286.

In protected mode the segment registers no longer held segment numbers. They held selectors. A selector contained three things:

1. An 13-bit index into a "descriptor" table. A descriptor table was a table describing memory segments, with each entry being an 8 byte data structure that included a 24 bit base address for a segment, a 16 bit size for a segment, and fields describing access limits and privilege information,

2. A bit that told which of two descriptor tables was to be used. One table was called the global descriptor table (GDT) and was shared by all processes. Typically the GDT would be used to describe the memory segments that were used by the operating system itself. The other table was called the local descriptor table (LDT), and there was one LDT per process. Typically this described the segments where the code and data of the process resided.

3. A 2-bit field contain access level information, used as part of the 286's 4 ring protection model.

To translate a selector:offset address to a physical address, the index from the selector was used to find a descriptor in the descriptor table the selector referred to. The offset from the selector:offset address was added to the segment base address from the descriptor, and the result was the physical address.

(The 80386 was similar. The main difference, aside from supporting 4 GB segments, was that the base address in the descriptor was no longer necessarily a physical address. The 386 included a paged memory management unit in addition to the segment based MMU. If the paged MMU was enabled the base addresses in the segment descriptors were virtual address for the paged MMU).

Here is how they packed the 13-bit index, 1-bit table selection, and 2-bits of access level into a 16-bit selector:

  +--------------------------+--+----+
  |          INDEX           | T| AL |
  +--------------------------+--+----+

Now consider an OS that has set up a process to have multiple 64K segments of, say, data with consecutive indexes. When the process wants to add something to a selector:offset pointer it can't just treat that as a 32-bit integer and do the normal 32-bit addition, because of those stupid T and AL fields. If there is a carry from adding something to the offset half of the 32-bit value, the selector half needs to have 0x0008 added to it, not the 0x0001 that carry normally would add.

If they had instead laid out the fields in the selector like this:

  +----+--+--------------------------+
  | AL | T|          INDEX           |      
  +----+--+--------------------------+

then it would work to treat a 32-bit selector:offset as a linear 29-bit address space with 3 additional bits on top. In most operating systems user mode programs would never have reason to fiddle with the access level bits or the table specifier, so as long as you made it so malloc returned pointers with those set right for the process then almost all the pain of memory on protected mode 286 would have gone away. An array that crossed segment boundaries, for example, would require no special handling by the compiler.

So why didn't they do this?

One theory I've heard is that because the descriptors in the GDT and LDT are 8 bytes each, to get the address of a descriptor the processor has to computer 8 x INDEX + BASE where base is the base address of the descriptor table, and by having INDEX in the selector in the to 13 bits, it is already in effect multiplied by 8 saving them having to shift the INDEX before feeding it into the adder for the address calculation.

I've talked to CPU designers and asked about that, and what they have told is that shifting INDEX for this kind of calculation is trivial. The segment registers would already be special cases, and it would be likely their connection to the adder used for address offsetting could simply be wired up so that the input from the segment registers was shifted. Even if they did need a little more hardware to handle it the amount would be very small, and almost certainly worth it for the tremendous improvement it would make to the architecture as seen by programs.

My guess is that simply no one involved in designing the selector and descriptor system was a compiler or OS person and so they didn't realize how they laid those 3 fields out would matter to user mode code.

zwieback · on Feb 21, 2022

I wrote a fair amount of code for TI's TMS320C4x DSPs. They had 32 bit sized char, short, int, long, float and double and a long double with 40 bits.

Took a bit to get used to but really the only way to get to the good stuff was by writing assembly code and hand-tuning all the pipeline stuff.

PhantomGremlin · on Feb 21, 2022

the MIPS R3000 processor ... raises an exception for signed integer overflow, unlike many other processors which silently wrap to negative values.

Too bad programmer laziness won and most current hardware doesn't support this.

As a teenager I remember getting hit by this all the time in assembly language programming for the IBM S/360. (FORTRAN turned it off).

   S0C8 Fixed-point overflow exception

When you're a kid you just do things quickly. This was the machine's way of slapping you upside your head and saying: "are you sure about that?"

masklinn · on Feb 21, 2022

> Too bad programmer laziness won and most current hardware doesn't support this.

There were discussions around this a few years back when Regher brought up the subject, and one of the issues commonly brought up is if you want to handle (or force handling of) overflow, traps are pretty shit, because it means you have to update the trap handler before each instruction which can overflow, because a global interrupt handler won't help you as it will just be a slower overflow flag (at which point you might as well just use an overflow flag). Traps are fine if you can set up a single trap handler then go through the entire program, but that's not how high-level languages deal with these issues.

32b x86 had INTO, and compilers didn't bother using it.

Findecanor · on Feb 22, 2022

Modern programming language exception handler implementations use tables with entries describing the code at each call-site instead of having costly setjmp()/longjmp() calls. I think you could do something similar with trap-sites, but the tables would probably be larger.

BTW. The Mill architecture handles exceptions from integer code like floating point NaN: setting a meta-data flag called NaR - Not a Result. It gets carried through calculations just like NaNs do: setting every result to NaR if used as an operand... Up until it gets used as operand to an actual trapping instruction, such as a store. And of course you could also test for NaR instead of trapping.

monocasa · on Feb 21, 2022

In practice the vast majority of MIPS code uses addu, the non trapping variant.

And in x86 land there's the into instruction, interrupt if overflow bit set, so you're left with the same options.

spc476 · on Feb 21, 2022

Which has to be done after every instruction (http://boston.conman.org/2015/09/05.2) but it quite slow. Using a conditional jump after each instruction is faster than using INTO (http://boston.conman.org/2015/09/07.1).

colejohnson66 · on Feb 21, 2022

My guess would be a pipelining issue where `INTO` isn't treated as a `Jcc`, but as an `INT` (mainly because it is an interrupt). Agner Fog's instruction tables[0] show (for the Pentium 4) `Jcc` takes one uOP with a throughput of 2-4. `INTO`, OTOH, when not taken uses four uOPs with a throughput of 18! Zen 3 is much better with a throughput of 2, but that's still worse than `JO raiseINTO`.

[0]: https://www.agner.org/optimize/instruction_tables.pdf

monocasa · on Feb 21, 2022

It's more complicated than shows up in micro benchmarks like that. Since when you do it, it's pretty much every add, you end up polluting your branch predictor by using jo instructions everywhere and it can lead to worse overall perf.

flohofwoe · on Feb 21, 2022

Modulo wraparound is just as much a feature in some situations as it is a bug in others. And signed vs unsigned are just different views on the same bag of bits (assuming two's complement numbers), most operations on two's complement numbers are 'sign agnostic' - I guess from a hardware designer's pov, that's the whole point :)

The question is rather: was it really a good idea to bake 'signedness' into the type system? ;)

pornel · on Feb 21, 2022

That's why Rust has separate operations for wrapping and non-wrapping arithmetic. When wrapping matters (e.g. you're writing a hash function), you make it explicit you want wrapping. Otherwise arithmetic can check for overflow (and does by default in debug builds).

masklinn · on Feb 22, 2022

> Modulo wraparound is just as much a feature in some situations as it is a bug in others.

That’s an extremely disingenuous line of reasoning, the situations where it’s a feature is a microscopic fraction of total code: most code is neither interested in nor built to handle modular arithmetics, and most of the code which is interested in modular arithmetics needs custom modulos (e.g. hashmaps), in which case register-size-modulo is useless.

That leaves a few cryptographic routines built specifically to leverage hardware modular arithmetics, which could trivially be opted in, because the developers of those specific routines know very well what they want.

> signed vs unsigned are just different views on the same bag of bits […] The question is rather: was it really a good idea to bake 'signedness' into the type system? ;)

The entire point of a type system is to interpret bytes in different ways, so that’s no different from asking whether it’s really a good idea to have a type system.

As to your final question, Java removed signedness from the type system (by making everything signed). It’s a pain in the ass.

flohofwoe · on Feb 22, 2022

> Java removed signedness from the type system (by making everything signed)

That's not removing signedness. Removing signedness would be treating integers as sign-less "bags of bits", and just map a signed or unsigned 'view' over those bits when actually needed (for instance when converting to a human-readable string). Essentially Schroedinger's Cat integers.

There'd need to be a handful 'signed operations' (for instance widening with sign extension vs filling with zero bits, or arithmetic vs logic right-shift), but most operations would be "sign-agnostic". It would be a more explicit way of doing integer math, closer to assembly, but it would also be a lot less confusing.

zozbot234 · on Feb 21, 2022

Modulo wraparound is convenient in non-trivial expressions involving addition, subtraction and multiplication because it will always give a correct in-range result if one exists. "Checking for overflow" in such cases is necessarily more complex than a simple check per operation; it must be designed case by case.

zozbot234 · on Feb 21, 2022

Overflow checks are trivial, there's no need for special hardware support. It's pretty much exclusively a language-level concern.

addaon · on Feb 21, 2022

Overflow checks can be very expensive without hardware support. Even on platforms with lightweight support (e.g. x86 'INTO'), you're replacing one of the fastest instructions out there -- think of how many execution units can handle a basic add -- with a sequence of two dependent instructions.

zozbot234 · on Feb 21, 2022

A vast majority of the cost is missed optimization due to having to compute partial states in connection to overflow errors. The checks themselves are trivially predicted, and that's when the compiler can't optimize them out.

laumars · on Feb 21, 2022

> When you're a kid you just do things quickly.

I don’t think this is a age problem. Plenty of adults are lazy and plenty of kids aren’t.

rsecora · on Feb 21, 2022

It still amazes me how the PDP-11 has the NUXI [1] problem at nibble level and how the PDP-11 was bytesexual [2].

[1] http://catb.org/jargon/html/N/NUXI-problem.html

[2] http://catb.org/jargon/html/B/bytesexual.html

gus_massa · on Feb 21, 2022

[If you remove the spaces at the beginning of the line, HN will make the links clicky. You probably need to add an enter in between to get the correct formatting.]

rsecora · on Feb 21, 2022

Done, thank you

gwern · on Feb 21, 2022

Note: "weird machines" here has nothing to do with the well-known security concept, just referring to unusual or obscure computers.

anonymousiam · on Feb 22, 2022

Good article.

I've done C on half of those platforms. 8051 is actually more complex than described. There are two different kinds of RAM with two different addressing modes. IIRC there are 128 bytes of "zero page" RAM and, depending on the specific variant, somewhere between 256 bytes and a few kilobytes of "normal" RAM. Both RAM types can be present, and the addresses can both be the same value, but point to different memory, so the context of the RAM type is critical. The variants usually have a lot more ROM than RAM so coding styles may need to be adjusted, such as using a lot of (ROM initialized data) constants instead of computing things at run time.

6502 has a similar "zero page" addressing mode to the 8051.

I never encountered any alignment exceptions on 68k (Aztec C). Either I was oblivious and lucky, or just naturally wrote good code.

I do remember something about PDP-11 where there was a maximum code segment size (32k words?).

C on the VAX (where I first learned it) was a superset of the (not yet) ANSI standard. I vaguely remember some cases where the compiler/environment would allow some lazy notation with regard to initialized data structures.

They left out some interesting platforms (such as 6809/OS9, TI MSP430 and PPC), which have their own quirks.

nivertech · on Feb 21, 2022

The author forgot to mention that 8051 has a bit-addressable lower part of RAM.

PDP-11 had a weird RAM overlay scheme of squeezing 256KB RAM into a 64KB 16-bit address space.

IBM System/360 also had a weird addressing scheme with base register and up to 4KB offsets.

https://en.wikipedia.org/wiki/IBM_System/360_architecture#Ad...

ChuckMcM · on Feb 21, 2022

I scored 7. (have written C code on six of the architectures mentioned (PDP 11, i86, VAX, 68K, IBM 360, AT&T 3B2, and DG Eclipse) I have also written C code on the DEC KL-10 (36 bit machine) which isn't covered. And while I have a PDP-8, I only have FOCAL and FORTRAN for it rather than C. I'm sure there is a C compiler out there somewhere :-).

With the imminent C23 spec I'm really amazed at how well C has held up over the last half century. A lot of things in computers are very 'temporal' (in that there are a lot of things that are all available at a certain point in time that are required for the system to work) but C has managed to dodge much of that.

kwertyoowiyop · on Feb 21, 2022

Given C’s origin on the PDP-11, it’s amazing it ended up so portable to all these crazy architectures. Even as an old-timer, the 8051 section made me say “WTF”!

jnwatson · on Feb 22, 2022

I love these weird machines. I'll give an example of another.

The Texas Instruments' C40 was a DSP made in the late 90's. It had 32-bit words, but inefficient byte manipulation. The compiler/ABI writer's solution was simple: make char 32 bits. So sizeof(char)==sizeof(short)==sizeof(int)==sizeof(long).

I remember writing routines for "packing" and "unpacking" native strings to byte strings and back.

adonovan · on Feb 22, 2022

I remember using that machine and wondering why my program used 4x the memory I expected. The hardware manual used the term "byte" in the old-fashioned way, to mean "minimum addressable unit": 32 bits. Today "byte" only ever means "octet".

kazinator · on Feb 22, 2022

Theoretical portability is not very useful. If you've not tested on a Unisys with 36 bit integers and 8 word function pointers, the theoretical port is garbage.

It may be easier to fix the code than if you didn't think about the Unisys, but that effort has to be weighed against the incredibly vanishing odds of it ever being required.

ziml77 · on Feb 22, 2022

While skimming did I miss an example of code that is portable between most of these systems? I'd love to see that because I'm having a very hard time believing that's possible. Or maybe you can make something that will technically compile and function on any of those, but if it doesn't perform reasonably then it's really hard to call the portability aspect a success.

Also you're very limited as to what you can actually do in ANSI C. You're going to need to start poking directly at the hardware which is not going to be portable. Hell, even stuff like checking if a letter is within a certain range in the English alphabet might not work between machines. Letters might not be contiguous in the machine's native character encoding.

gnufx · on Feb 22, 2022

I thought there'd be mention of CHERI as an up-to-the-minute architecture (as Arm Morello). I don't remember whether it requires modifications to standard C, but there's a C programming guide [1].

People who say everything is little endian presumably don't maintain packages for the major GNU/Linux systems which support s390x. I don't remember how many big endian targets I could count in GCC's list when I looked fairly recently. The place to look for "weird" features there is presumably the embedded systems processors.

1. https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf

viddi · on Feb 21, 2022

Haven't read the article yet, but I have noticed that the tab keeps loading even after 10 minutes. Aborting the loading process leads to broken media.

I am no expert in HTML video delivery and haven't tried it out, but maybe setting the preload attribute to "none" or "metadata" might help?

rjsw · on Feb 21, 2022

There are C compilers for the PDP-10, it must count as fairly weird.

AnimalMuppet · on Feb 21, 2022

Overall a good article. I was a bit amused and/or disgruntled to see a TRS80 in the "Motorola 68000" section, though...

rjsw · on Feb 21, 2022

Why disgruntled? I never saw a Model 16 [1] but they did exist.

[1] https://en.wikipedia.org/wiki/TRS-80_Model_II#model16

AnimalMuppet · on Feb 21, 2022

Well, OK, but the vast majority were the Z80-only versions.

I can't read the model number on the frame of video it statically gives me, and I'm not going to play the video to find out...

ForOldHack · on Feb 22, 2022

No mention of the Interdata 8/32 or the (cough cough) IBM RT?

jecel · on Feb 22, 2022

Indeed, the effort to port C and Unix to the Interdata forced C to evolve quite a bit (adding unions, typedefs and so on):

https://www.bell-labs.com/usr/dmr/www/portpap.html

enjoyyourlife · on Feb 22, 2022

Why don't C programmers always use stdint.h?