Please don't use "word" to mean 16 bits. In an era when machine words are genera...

phkahler · on March 18, 2019

>> One thing I love about Rust is that it uses u16, u32, u64 etc for unsigned and i16, i32, i64 etc for signed, which is about perfect

Yes, even good old C has uint16_t and int16_t for this. I use these exclusively for embedded work because we care about the size of everything. Also agree that Rust gets it right by using a single character with the size: u16, i16.

It's funny because C opted to leave the number of bits machine dependent in the name of portability, but that turns out to have the opposite effect.

ISV_Damocles · on March 18, 2019

> It's funny because C opted to leave the number of bits machine dependent in the name of portability, but that turns out to have the opposite effect.

That depends on what you consider the portable part. In the era of proprietary mainframes operating on proprietary datasets, data portability probably didn't matter as much as code portability to perform the same sort of operation on another machine.

C's size-less `int`, `float`, etc allows the exact same code to compile on the native (high speed) sizes of the platform without any editing, `ifdef`s, etc.

(Side note: That's what bothers me a lot about the language wars -- the features of a language are based on the trade-offs between legibility, performance, and the environment from the era they were intended to be used in. Often both sides of those spats fail to remember that.)

kazinator · on March 19, 2019

> good old C has uint16_t

Firstly, no, good old C doesn't. These things are a rather new addition (C99). In 1999 there was decades of good old C already which didn't have int16_t.

It is implementation-defined whether there is an int16_t; so C doesn't really have int16_t in the sense that it has int.

> It's funny because C opted to leave the number of bits machine dependent in the name of portability, but that turns out to have the opposite effect.

Is that so? This code will work nicely on an ancient Unix box with 16 bit int, or on a machine with 32 or even 64 bit int:

  #include <stdio.h>
  int main(void)
  {
    int i;
    char a[] = "abc";
    for (i = 0; i < sizeof a; i++)
      putchar(a[i]);
    putchar('\n');
    return 0;
  }

Write a convincing argument that we should change both ints here to int32_t or whatever for improved portability.

phkahler · on March 19, 2019

>> Firstly, no, good old C doesn't.

Yes, it does. The C99 standard has been around for 20 years. I started giving up on compilers that don't support it 10 years ago. I consider it a given for C.

>> It is implementation-defined whether there is an int16_t

No, it's not. That type is part of the C99 standard.

>> Is that so? This code will work nicely on an ancient Unix box with 16 bit int, or on a machine with 32 or even 64 bit int:

That's cool, your example only needs to count to 3. Any size integer will do. The problems arise when you go to 100,000 and that old 16bit machine rolls over at 65536 but the newer ones don't. Other times someone (me) may want things to roll over at the 16 bit boundary and we need to specify the size as int16_t rather than figure out if int or short or char is that size for each architecture. (and yes I know rollover is undefined behavior)

>> Write a convincing argument that we should change both ints here to int32_t or whatever for improved portability.

In your example it doesn't matter. I'd argue that at least giving the size of your integers some thought every time is a good habit to get into so you don't write non-portable code in the cases where it does matter. You are free to argue that it's too much effort or something, but I'd invite you to argue that "i16" is more effort to type than "int".

kazinator · on March 19, 2019

> The problems arise when you go to 100,000 and that old 16bit machine rolls over at 65536 but the newer ones don't

There, we are running into the question of: on that system, can we define that array at all, if it has 100,000 characters.

> That type is part of the C99 standard.

Unfortunately, the standard isn't what translates and executes your code; that would be the implementation. The standard allows implementations not to provide int16_t, if they have no type for which it can be a typedef name. A maximally program can use int16_t only if it has detected that it's present. If an int16_t typedef name is provided then <stdint.h> will also define the macro INT16_MAX. We can thus have code conditional on the existence of int16_t via #ifdef. (Since we know it has to be nonzero if provided, we can use #if also).

> Other times someone (me) may want things to roll over at the 16 bit boundary and we need to specify the size as int16_t rather than figure out if int or short or char is that size for each architecture. (and yes I know rollover is undefined behavior)

Someone who knows C knows that unsigned short is 16 bits wide or wider, as is unsigned int. A maximally portable wrap-around of a 16 bit counter, using either of thise types, is achieved using: cntr = (cntr + 1) & 0xFFFF. That will work on a machine that has no 16 bit word, and whose compiler doesn't simulate the existence of one.

We can do it less portably if we rely on there being a uint16_t; then we can drop the & 0xFFFF. (It isn't undefined behavior if we use the unsigned type.) The existence of uint16_t in the standard is encouraging programmers to write less portable; they reach for that instead of the portable code with the & 0xFFFF. Code relying on uint16_t, ironically, is not portable to the PDP-7 machines that Thompson and Ritchie originally worked with on Unix; those machines have 18 bit words.

PorterDuff · on March 18, 2019

I've always wondered if someone wrote a close-to-the-metal C compiler for the CDC 6400.

60 bit words, 18 and 60 bit registers, 6 bit characters.

a1369209993 · on March 18, 2019

I think that technically isn't allowed, since the C standard requires single-byte representations of 26 uppercase + 26 lowercase + 10 numeric + 29 symbol + 1 space = 92 characters (and assorted (most of them useless) control characters), which don't fit in 64 values of a 6-bit byte.

It would interesting to see a CDC 6400 implementation of "like C, but we ignore the bit about character set and also the part where the compiler is a evil genie trying to screw you over", though.

kazinator · on March 19, 2019

"word" for "16 bits" is not so much an anachronism as an Intel-Microsoftism.

In computing, "word" is understood to be machine dependent: "what is that machine's word size?"

vanderZwan · on March 18, 2019

I suspect the reason for this unfortunate naming has something to do with the fact that strings are encoded in UTF16 in JavaScript. Having said that I completely agree with you.