On leading underscores and names reserved by the C and C++ languages

PreInternet01 · on Jan 11, 2023

> It may come as a surprise that the C language reserves identifiers like strong, island, and together, but it does

OK, I know it's only an offhand remark in a blog post, but now I'm going to have to spend significant energy on:

1. Finding where these identifiers are reserved, exactly, because obvious sources like https://pubs.opengroup.org/onlinepubs/9699919799/functions/V... don't seem to include them?

2. Trying to come up with the intended usage for these identifiers (`island` I imagine, might be a scope: only accessible to other identifiers on the same island? oh my...)

ameliaquining · on Jan 11, 2023

That's POSIX, not the C language standard. https://www.iso-9899.info/n1570.html#7.31 lists the library API names reserved for future use; section 7.1.3 in the same document states explicitly that they are reserved.

It's not that the word "island" has any particular significance; it's that all names starting with "is" followed by a lowercase letter are reserved (in external linkage and in the global scope of files that #include <ctype.h>), so that future versions of the language can add more standard library functions along the lines of isalpha, isdigit, etc., without making the current standards committee guess in advance which specific names following that pattern their successors might want to add in the future, and without breaking existing code. (Unless that existing code ignores these rules, which seems to be fairly common in practice.)

SAI_Peregrinus · on Jan 11, 2023

The C is of course a pirate's favorite language. It might end up with "water" and "land" types in the standard some day. So `bool iswater()` and `bool island()` functions might be needed! You might need to check if the island structure you got is really land, so `island(&possible_island)` would obviously help and not cause any confusion to readers of the code.

ketralnis · on Jan 12, 2023

Aye, any pirate's first love be the C

sokoloff · on Jan 12, 2023

You might want to know if you have a small piece of land surrounded by water, so we need

  isisland(x);

Or to know if an element is a member of a particular terror group (or an Egyptian goddess):

  isisis(y);

cozzyd · on Jan 12, 2023

And if you want to know if some territory is claimed by a particular terror group (or Egyptian goddess, or post-metal band) there is the venerable

  isisisland(z);

In this case you'd typically then construct a memory fence.

cozzyd · on Jan 12, 2023

You just have to pass the pirate flag to your compiler

0xADD1E · on Jan 13, 2023

I think you’ll find it’s actually called an arrgument

Gibbon1 · on Jan 13, 2023

Of course depending on how locale is set will effect the results of island() and iswater()

pcwalton · on Jan 11, 2023

I saw someone point out that these identifier restrictions mean that a valid C compiler optimization would be to change all instances of "x = to[a-z]+(y);" to "if (!is[a-z]+(y)) x = to[a-z]+(y);" Not that any compiler would actually be absurd enough to implement this, of course :)

Naturally, this would imply the validity of the optimization:

x = toilet(y); -> if (!isilet(y)) x = toilet(y);

caf · on Jan 12, 2023

Only if you've included <ctype.h> or <wctype.h> though.

xigoi · on Jan 11, 2023

They could just generate a random string, making it extremely unlikely that it will conflict with an existing name. And it won't be any less descriptive than most existing names in the C standard library.

mananaysiempre · on Jan 11, 2023

Note that the original post is not up to date with the C23 developments on the subject, wherein the standards people appear to have noticed that nobody has ever cared about overbroad reservations like str*, is*, to*, and E*, and introduced a notion of “potentially reserved identifier” that I have so far been unable to understand. The relevant paper, “What we think we reserve”[1], has been folded into the draft standard.

[1] https://www.open-std.org/jtc1/sc22/WG14/www/docs/n2625.pdf (I hope that’s the right version)

Joker_vD · on Jan 11, 2023

Basically, the wording is so that they can advise implementations to complain on potentially invalid-in-the-future uses of "potentially reserved identifiers", but invalid uses of actually reserved identifiers are still "requires no diagnostics". They feel that allows them to extend the already way too vast amount of reserved identifiers even further without any reservations.

Yes, that means that an implementation will warn you that your code might break in the future, but on the day when it actually breaks, it will just stop complaining. Why they could not just mandate detecting invalid uses of reserved identifiers is beyond me.

peff · on Jan 11, 2023

On the page you linked, you can see that `ctype.h` reserves all prefixes of `is[a-z]` and `to[a-z]`, and `string.h` reserves `str[a-z]`. These come from the C standard (in C99, it's 7.26 "Future Library Directions"), though I don't think they use the word "reserved" there.

aw1621107 · on Jan 11, 2023

The "reserved" bit is from section 7.1.3:

> Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

AdamH12113 · on Jan 11, 2023

He says where they come from -- section 7.31 of the C11 standard[1]. For example, in 7.31.12:

> Function names that begin with str and a lowercase letter may be added to the declarations in the <stdlib.h> header.

Or 7.31.13:

> Function names that begin with str, mem, or wcs and a lowercase letter may be added to the declarations in the <string.h> header.

So according to the standard, if your code #includes string.h, it may conflict with a future version of the standard if you use names like "strong" or "memorize". This is extraordinarily unlikely, though.

[1] https://port70.net/~nsz/c/c11/n1570.html#7.31

isaacg · on Jan 11, 2023

These strings are included in the regexes given just about that line as reserved for potential future by certain header files. "island" is covered by the ctype.h and wctype.h headers, which reserve every word starting with "is".

miohtama · on Jan 11, 2023

One of these situations where “I don’t make the rules” is a fit one liner.

kzrdude · on Jan 11, 2023

Then after looking into the C standard, POSIX further defines C identifiers for itself. That includes `_t` in type names! It's a rule so far from enforcement in reality that it is hard to take seriously.

msla · on Jan 11, 2023

It's "enforced" to the extent that future versions of the standard will feel free to define new types which end with _t without worrying about breaking existing code. If you defined your own uint32_t type and the new standard's uint32_t type stomps on it, well, you were (theoretically) warned.

ryanianian · on Jan 11, 2023

The trailing _t is reserved, and I didn't know that for many years. I learned of it only after an interview candidate pointed it out as a gotcha in my question prompt. They got bonus points, and I learned something. I wonder why compilers or static analyzers don't complain about these things more often?

flohofwoe · on Jan 11, 2023

> I wonder why compilers or static analyzers don't complain about these things more often?

Because it's not the C compiler's business to care about POSIX.

POSIX and the C standard are different things. If you're writing code against the Windows APIs, POSIX isn't relevant for instance, and you need to be more concerned about colliding with type names or defines from the Windows API headers, and those are all over the place anyway.

...and besides: that rule is entirely pointless in reality, either your type names collide with POSIX types from headers you are including (in that case you're getting a compiler error anyway), or they don't collide, and in that case all is good.

taeric · on Jan 11, 2023

There is one more case, for your list at the end. The names could start colliding when you get to add interaction with a posix system.

That is, most of the point of that was for some standards to give a "follow these rules, and you will have an easier time integrating with this standard if that is in your plans."

Obviously, if you have no plans to head to posix, they accomplish nothing for you. Similarly, not following the rules doesn't prevent that direction, just adds some extra work. Potentially.

never_inline · on Jan 11, 2023

> besides: that rule is entirely pointless in reality

I think it means: a library author cannot name some type epoch_t, because POSIX may introduce an epoch_t in the next revision, and suddenly some code may fail to compile.

flohofwoe · on Jan 11, 2023

Yes, but this sort of problem exists with other 'standard APIs' too which change much more frequently than POSIX.

saagarjha · on Jan 11, 2023

> Because it's not the C compiler's business to care about POSIX.

It is when people write POSIX-compliant code and their compiler doesn’t let them.

pjmlp · on Jan 12, 2023

ISO C compliance doesn't have any POSIX compliance requirements, as ISO C compilers target more platforms than UNIX like clones.

It is the POSIX platform owners business to ensure POSIX compliance on their own C compiler.

saagarjha · on Jan 13, 2023

The world’s most popular compilers target POSIX platforms. Being compliant with that is very important to them.

pjmlp · on Jan 13, 2023

Except they seem quite happy to only care about what ISO requires from them.

saagarjha · on Jan 14, 2023

To be clear we are talking about the same GCC and Clang here, right?

pengaru · on Jan 11, 2023

The _t suffix is so ergonomic for typedefs this is one area of POSIX I simply ignore, because my types are usually prefixed with a namespace POSIX won't ever collide with anyways.

kzrdude · on Jan 11, 2023

Gotcha is a good description of the whole thing and I would suggest it is not a concern for real code.

layer8 · on Jan 11, 2023

Effectively, every C library needs to do that, usually reserving some prefix.

It’s just a way to state “if you’re using this API/library, be prepared that future versions may introduce additional symbols matching these patterns, and/or that the current version may define undocumented symbols matching these patterns”. You then have the choice to take care to not define symbols conflicting with that.

The alternative would be for libraries to just add arbitrary new symbols, with no way for client code to proactively prevent conflicts.

tragomaskhalos · on Jan 11, 2023

Yes _t is very attractive to programmers as a suffix to indicate a type, and I've flagged in the past to people that they shouldn't technically do that - usually to be met with utter indifference

jhoechtl · on Jan 11, 2023

> Windows header files have historically not been conscientious about avoiding these reserved names. We’re trying to do better for new headers, but not everyone has gotten the memo.

Love that tone

pjmlp · on Jan 11, 2023

Windows 2000 introduced the concept of application manifests, where executables get a mini XML file for application specific settings instead of storing them into the registry.

They also allow for registration free COM components.

To this day many Microsoft teams still haven't gotten the memo.

TeMPOraL · on Jan 11, 2023

I'm not sure if they provided adequate tooling for the former. They definitely didn't provide adequate documentation for the latter.

Or maybe they did - back when MSDN was actually well-organized and somewhat complete-ish (and shipped on optical disks or otherwise downloadable!), I was too young to make much use of it. Now, that wealth of knowledge is mostly gone, and external links to it don't resolve.

pjmlp · on Jan 12, 2023

Naturally it is documented, otherwise how would I know about it without ever being a Microsoft employee?

These are the current locations for them,

https://learn.microsoft.com/en-us/windows/win32/sbscs/creati...

https://learn.microsoft.com/en-us/windows/win32/sbscs/manife...

https://learn.microsoft.com/en-us/windows/win32/sbscs/isolat...

https://learn.microsoft.com/en-us/windows/win32/sbscs/author... (see last bullet points regarding not to use the registry as best practice)

The problem as usual, is lack of education on the matter, and willingness to change how people work.

I stand corrected on one point though, it has been a long time and actually it was only introduced on Windows XP and Server 2003, not Windows 2000.

mastax · on Jan 12, 2023

I keep seeing new tools written by Microsoft that store settings in dot files in my user directory or my documents rather than AppData.

pjmlp · on Jan 12, 2023

Most likely from new interns educated on UNIX without proper work review.

At least that's my guess, given recent trends.

Scubabear68 · on Jan 11, 2023

On a tangent, but this is what makes Python look so ugly to me, the required use of double underscores for things like equals is extremely jarring.

I have very slowly come to appreciate it as a language, but visually it will always be ugly to me because of clashes like this with other languages.

Genbox · on Jan 11, 2023

I get the same feeling when looking at SIMD code.

> const __m256i in = _mm256_loadu_si256((const __m256i*)ptr);

Narishma · on Jan 11, 2023

It's the reason I prefer to write inline assembly, or even external assembly, for SIMD code rather than use intrinsics. Much easier to read.

AshamedCaptain · on Jan 11, 2023

you are supposed to use a typedef, not to do that.

It's like complaining you have to use _Bool all over the code. Include stdbool...

Aardwolf · on Jan 11, 2023

> We’re trying to do better for new headers, but not everyone has gotten the memo.

And memo appears to be one of the disallowed variable names in C11, given the mem[a-z].* pattern

gernb · on Jan 11, 2023

I know it's a tradeoff but I think I fall on the side of I wish the language I was writing in required `this` or `self` for members/properties.

C++

    foo = bar \* 2;

Are foo and bar local variables or members of some instance?

vs

Python

    self.foo = self.bar \* 2

100% clear. No naming convention needed.

I bring this up because `_foo` for members is a naming convention that wouldn't be needed in a language that required `self` or `this`

That said, I get that maybe refactoring some code from standalone function to class method is easier if you don't have to change the code as much but I'd be curious how often that's a net win.

mhh__ · on Jan 11, 2023

As a language design point it's also saying that (this simple modification to C++)

    void add(const this, int x)

Is more readable than (IMO)

    void add(int x) const

TeMPOraL · on Jan 11, 2023

Might as well just always spell out the hidden "this" argument. Then all methods would look just like regular functions, allowing to simplify the language syntax and the standard, making it more concise and consistent, without sacrificing any functionality.

kccqzy · on Jan 11, 2023

You may like the "deducing this" proposal: http://wg21.link/P0847

(Scroll down to the "proposed syntax" section.)

pronlover723 · on Jan 12, 2023

That can go both ways?

I agree that I like when functions are not special so

   instance.add(10)

Is just sugar for

   add(instance, 10)

and you pass anything that fits as the first argument.

But, following the "syntactic sugar" is okay rule

    class Foo {
      add(int v);
    }

Is just syntactic sugar for

    void Foo.add(Foo this, int v);

... or something along those lines... ?

TillE · on Jan 11, 2023

When writing idiomatic C++, you typically end up with mostly stuff like impl->foo anyway.

The pimpl pattern is sort of a weird artifact of how the compiler works, but it generally works out as a smart way to structure your code.

BenFrantzDale · on Jan 12, 2023

It really depends. I used PImpl today but it’s the rare exception. Between small inlinable classes and abstract base classes, there are lots of ways work. But that’s the beauty of C++: there are lots of ways to solve problems!

ldh0011 · on Jan 12, 2023

I worked with C++ at my last job and now again for a side project with a partner. People not using `this` is one of my biggest pet peeves.

ihatepython · on Jan 12, 2023

This.

int_19h · on Jan 13, 2023

In C++ especially, respecting the rules on underscores is important because so much of stdlib is header-only, meaning lots of inline code that's exposed to shenanigans with identifiers used in it. For example, if you #define foo 123, and some inline function has a local named "foo", there's going to be a problem. This is why, when you look at standard C++ headers in most implementations, all identifiers, even locals, tend to be named _Like __this. And stdlib may also need to define its own internal macros (as there's a lot of repeated verbiage), which also follow this name pattern.

So if you disregard the fact that those are reserved, it might build, but even a minor update of your implementation that e.g. adds a new local to some obscure standard function can break it, never mind a new version of the C++ standard.

chjj · on Jan 11, 2023

I tried to decipher all of the rules surrounding reserved identifiers once. It was surprisingly tricky.

https://gist.github.com/chjj/d0c1218e473bbb6d8f9e2224c583e2d...

bloak · on Jan 11, 2023

"It may come as a surprise that the C language reserves identifiers like strong, island, and together, but it does."

Yes, I would guess that a large proportion of C programmers are not familiar with those rules. Is there a way of getting GCC or LLVM to warn about it?

beached_whale · on Jan 11, 2023

clang recently added `-Wreserved-identifier`, I think in v14 or v15. I may be slightly off in how it is said, but it’s there. Not sure about gcc

rwmj · on Jan 11, 2023

The GCC bug's only been around for 11 years: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51437

coldpie · on Jan 11, 2023

Strong evidence that these rules don't actually matter and can be disregarded.

rwmj · on Jan 11, 2023

For sure! I've been writing C for about 40 years and didn't know that such innocent identifiers as "total" were reserved.

jerf · on Jan 11, 2023

I mean, de facto, if Windows isn't good at it, at the very least compilers will have a "never mind ignore this" option if they even do start enforcing them. And while Raymond Chen knows Windows and can speak to it, I'd bet the Linux kernel has plenty of violations of this, and it's the same thing; no future C compilers are going to invalidate that no matter what some C committee somewhere says.

raverbashing · on Jan 11, 2023

It's more like "disregard them at your own risk"

The issue here is not GCC/Clang. It's developers

You risk having a name collision if you define a strfoom in your code and then later the C committee decides to add that to string.h

Aardwolf · on Jan 11, 2023

The issue is not developers, but disallowing the patterns is[a-z].*, to[a-z].*, str[a-z].* and others. Why would anyone do this to a programming language?

It's a stupid decision to reserve those and it ruins the usability of the language by arbitrarily disallowing many common words that have nothing to do with the intended feature. Even if you don't export names like this directly, having the name of your library or company (as prefix for all exported names) start with those common combinations is likely as well, so now you got to avoid certain library or company names.

If this is only since C11, then I'm even more baffled, since I could somewhat see how such thing could happen in the 1970s when there were limits and the language was brand new and not known to be popular, but in 2011 doing this makes no sense whatsoever.

I hope this is limited to C11 and will never happen in C++.

zokier · on Jan 11, 2023

Root problem is the lack of namespaces in C, making collisions inevitable and painful to deal with

flohofwoe · on Jan 11, 2023

C doesn't even need namespaces for this, a simple stdc_ prefix for stdlib functions and types would also work (see C23 stdbit.h, I hope this won't be an outlier).

im3w1l · on Jan 11, 2023

With proper namespaces you can import stuff so you don't have to prefix with the namespace every time.

flohofwoe · on Jan 12, 2023

Yes, but we can't simply port C++ namespaces over because this requires symbol name mangling. C could be using 'namespace prefixes' instead with something like 'usingprefix stdc_', this wouldn't require changes to the symbol format, and would provide most of the features of C++ namespaces.

int_19h · on Jan 13, 2023

That gets very ugly very quick for something as basic as e.g. memcpy or strcmp.

kevin_thibedeau · on Jan 11, 2023

C23 technically has to support namespaces for attributes. It'll be interesting to see if they get adopted into the core language at some point.

layer8 · on Jan 11, 2023

What else should they reserve in order to be able to introduce new identifiers? The alternative would be that new functions would have to look like __std_c__strfoo().

Aardwolf · on Jan 11, 2023

They could:

-use more obscure letter combinations than "is", "to" and "str" which are very common beginnings of words. How do "is", "to" and "str" help anyway if they want to add functionality that has no sensible name starting with those?

-use two underscores at the beginning since as per the article that is already reserved

-use stdc_ as prefix

codeflo · on Jan 11, 2023

Or introduce them as __newfunction for the linker, but have a new optional header file that #defines the short name. That’s the route that was chosen for C99’s _Bool/bool (there it was a typedef), and it worked well AFAIK.

layer8 · on Jan 11, 2023

The drawback of the macro solution is that it also disables those names for static symbols and struct members etc. when the header is included. For bool this was okay as it was intended as an opt-in keyword and the header was a new one, but doing that for all functions and in existing headers would potentially break more code.

int_19h · on Jan 13, 2023

Stuff like "isalpha", "tolower", and "strlen" long predates standardization of C, so they had to work with what they already had.

pmontra · on Jan 11, 2023

I really hate leading underscores, in every language.

stc_ or similar prefixes is the way to go.

flohofwoe · on Jan 11, 2023

TBH it would make a lot more sense if the C committee would start adding prefixes for new additions to the C stdlib (like the functions in C23 stdbit.h: stdc_popcount, stdc_bit_width, etc...)

coldpie · on Jan 11, 2023

> It's more like "disregard them at your own risk"

I'd argue that a compiler update that breaks existing code for something as innocuous as using the "wrong" variable name is a compiler bug, even if the code technically violates the spec. It is simply too late now for the spec to begin using the keywords they reserved 40 years ago.

asveikau · on Jan 11, 2023

In truth they've been careful about introducing names and not breaking existing code. For example, when they finally did a bool type, they called it _Bool to prevent incompatibilities with nonstandard bool types. You needed to include <stdbool.h> to get a friendly, usable name of "bool".

There was a similar thing for _Complex.

Denvercoder9 · on Jan 11, 2023

Following that reasoning, the C standard library would _never_ be able to add another function (or macro, or struct, or...).

coldpie · on Jan 11, 2023

Within reason, yeah. Like if they chose to add a new "island" function and my geographics software broke, that's clearly the spec's fault, not mine, even though technically I'm violating spec. I'd even argue any "is.*" function is probably too late to add to the spec at this point in time.

torstenvl · on Jan 12, 2023

> I'd even argue any "is.*" function is probably too late to add to the spec at this point in time.

I'd like to see isualpha() isudigit() isupunct() isuspace() isugraph() etc. plus isucombining(). They'd supplement iswxxx() but (a) taking a `signed unicode int` (for C2x defined as at least 32 bits wide) and (b) being locale-independent.

kps · on Jan 11, 2023

Also, clang-tidy has `bugprone-reserved-identifier` https://clang.llvm.org/extra/clang-tidy/checks/bugprone/rese...

planede · on Jan 11, 2023

It only checks for the ones starting with underscore.

> This check does not (yet) check for other reserved names, e.g. macro names identical to language keywords, and names specifically reserved by language standards, e.g. C++ ‘zombie names’ and C future library directions.

peterfirefly · on Jan 12, 2023

And it complains about this:

    #define _POSIX_SOURCE

which is sometimes needed before the #include's to get access to nice, modern library functions that are, say, thread-safe.

marssaxman · on Jan 11, 2023

After thirty-some-odd years writing C, this is the first I have ever heard of it. I wonder how many times I have inadvertently broken those rules.

coreyp_1 · on Jan 11, 2023

I got bit by this last night (C++), and had no idea that it was a thing. What a timely post!

chappar · on Jan 11, 2023

It would be interesting to hear more about this!

omgmajk · on Jan 11, 2023

I hear Dave (Dave's Garage) talk about Raymond in a lot of his episodes on youtube and I get the feeling this guy is a legend.

Kwpolska · on Jan 11, 2023

Yes, Raymond is a legend, and it’s really worth it to dig into the blog archives.

(Dave, however, has a really annoying presentation style, and is so overrated.)

zabzonk · on Jan 11, 2023

really, avoid underscores altogether - they are very horrible. even people that know that there are exclusion rules for them don't really know what the rules are. and if you get things wrong, the error messages your compiler gives you will be incomprehensible. so, just don't use them - why would you?

cesaref · on Jan 11, 2023

I've never seen leading underscores in classes. I've seen trailing underscores, and the old m_ prefix, but never a plain _.

I was surprised by C++11 having additional prefixes it has reserved. I'm now also wondering whether there is a clang/gcc option to warn about such things, as although I know we don't currently have any issues in our code base (as in, it compiles and works) I don't really want to publish a public API and have to revisit it because of such a conflict in C++29 or whatever

ryanianian · on Jan 11, 2023

Leading underscores help you disambiguate `_name` the field from `name` the member-function. Similar thing in Python. How else do you solve this without it being even more confusing? Is `m_` really preferable? (Honest question.)

saurik · on Jan 11, 2023

As offered by the comment you responded to: trailing underscore. And yes: since leading underscore is reserved, and something needed to be reserved for reasonably-good reasons, m_ is preferable to _ if some reason you simply refuse to use a trailing modifier.

Dylan16807 · on Jan 11, 2023

> if some reason you simply refuse to use a trailing modifier

I'm not going to make a big deal about it, but it makes more sense to me to put the scope of a variable at the front. The front is where you put foo. and foo-> and foo[], after all.

int_19h · on Jan 13, 2023

Leading underscore is only reserved if the following letter is uppercase. And it's fairly common for C++ code to have coding style that mandates that fields start with a lowercase letter anyway.

flohofwoe · on Jan 11, 2023

Some coding styles use capitalisation for this (e.g. member 'name' vs getter 'Name()') - personally I prefer snake_case myself though.

dlivingston · on Jan 11, 2023

We use `m_` at work and I've come to really appreciate it. It makes reading code very easy - any variable you see prefixed with `m_` is a class member field. Anything else is either a function argument, a locally scoped variable, or has some other prefix (`k`, for example, referring to static constants).

Since we read code far more than we write it, sprinkling little "usage hints" like this across symbol names removes a lot more cognitive overhead than I would have thought.

alex_suzuki · on Jan 11, 2023

I don‘t think that using m_ or s_ as prefixes is bad, but I think it should be the IDE‘s job to provide visual hints (color, bold, italic, etc.) with respect to the identifiers storage.

zabzonk · on Jan 11, 2023

some underscores have been reserved in certain situations in user code from ansi c and onwards - there is simply no need to use them. trailing underscores will work but why bother? they are difficult to read and difficult to type.

colanderman · on Jan 11, 2023

Trailing underscores are my go to for resolving conflicts with keywords. It's a much more consistent and trivial to remember rule than random misspellings or synonyms of keywords (casts side-eye to `klazz`).

zabzonk · on Jan 11, 2023

don't understand this at all - why would you want to use a keyword as a name, and the last bit of the comment makes no sense at all.

flohofwoe · on Jan 11, 2023

If you're implementing some sort of persistency or reflection system in C++, 'Class* clazz' isn't all that unusual.

seritools · on Jan 11, 2023

maybe you build a game and the character has a... class?

zabzonk · on Jan 11, 2023

what is so special about the underscore that makes it differentiable?

danybittel · on Jan 11, 2023

So C did pre-emptively reserve certain words? Like island, strong, together..

I wonder if that paid off in some cases? Or to ask differently, if one would design a new programming language (now), would you consider reserving words in advance?

gavinhoward · on Jan 11, 2023

Programming language designer here.

I'm currently making a language, and yes, I'm reserving words for it because I want an easy C ABI. But I'm also making them easier to avoid.

Here's my list of reserved words:

* Anything that begins with `y_`.

* Anything that begins with `yc_`.

* Anything that begins with `YC_`.

* Anything with three or more consecutive underscores.

* Edit: Anything that begins with an underscore. This is because my language will be able to transpile to C if necessary.

The first is for types and items in the standard library (the language's name is Yao, so a `y` makes sense). The second and third are for the C ABI (hence, `yc`) and for historical reasons. `YC_` in particular is for C macros.

The last is for name "mangling." I put it in quotes because my language's standard name mangling (it will be the same across every implementation) will not really mangle the names. Instead, it will concatenate them, using five underscores between package names, four underscores between packages and items in a package, and three underscores between an item and its suffix. (For overloaded functions, the programmer has to define a suffix for each one. That suffix is how their names will be different in the C ABI.)

My hope is that these rules will not be too onerous. I don't think the reserved prefixes are used much, and I haven't seen anyone use more than one consecutive underscore, though I'm allowing one more, just in case.

pengaru · on Jan 11, 2023

It's unfortunate POSIX didn't adopt a similar convention for its reservations, like a psx_ suffix for its namespace. But I guess with POSIX it was more ratifying established *NIX things as a standard they could all easily agree on without too much disruption.

cyber_kinetist · on Jan 12, 2023

‘y_’ is too strict in my opinion. You declare lots of variables like y_1 or y_ans when you write any sort of numerical calculation code (like in games or physics simulations).

gavinhoward · on Jan 12, 2023

It seems too strict, but I should clarify: these rules only apply to the C ABI.

Declaring variables with those names actually will not conflict if they are done in pure Yao. There's a special way to access the C symbols of functions and types, and it's deliberately different, for this very purpose.

cyber_kinetist · on Jan 13, 2023

Ah if that’s only for the C ABI then that’s much better. I still see someone writing numerical algorithms might use y_* as one of the arguments, though the probability is a lot more slim.

gavinhoward · on Jan 13, 2023

That's true, so I should clarify even further.

The restriction isn't on `y[c]_`. It's actually on `y[c]___`.

This is because `y` is the package name for the standard library, and the standard library name will always be separated from the rest of the name by 3 or more underscores.

Really, the restriction is on 3 or more consecutive underscores in C names, and you can't have have a package name that is either `y`, `yc`, or `YC`.

I apologize for the confusion. I was in a hurry and on mobile with the original post.

layer8 · on Jan 11, 2023

New languages usually support namespaces, which prevents the problem.

As for payoff, new versions of the C standard usually introduce new functions and macros with those name patterns, without breaking client code that respects the reserved-name rules.

kzrdude · on Jan 11, 2023

If you reserve something in a new language, you'd properly make it an error to use it.

godshatter · on Jan 11, 2023

I wouldn't hesitate to use island, strong, or together as a variable name, even though they are technically "reserved". Maybe avoid "strong", it's on the edge of being made into a reserved word. I'd stay away from "isnull", "strsplit", or "toint" or similar variable names though. Although "isNull", "strSplit", and "toInt" are still fine.

I think the idea of reserving keywords in advance is a good one, but not something that interferes with vocabulary words so easily. The idea is to be able to add a new keyword in the future without breaking code that uses that keyword itself currently.

klodolph · on Jan 11, 2023

ES5 had "future reserved words" including class, const, export, import, and let.

WorldMaker · on Jan 11, 2023

The ES5 case is interesting because most of that list of "future reserved words" was a list of reserved words in ES4 "The Lost Version". That list also included fun things still not used like private, public, abstract, package, byte, int, volatile, synchronized.

WorldMaker · on Jan 11, 2023

One additional fun thing to note is that despite "private" already being reserved as a keyword TC-39 when they did somewhat recently add private fields to classes a few years back decided on "hash names" over the private keyword.

    class Example {
      #privateFieldName
    }

Rather than:

    class Example {
      private privateFieldName
    }

The debate on that was pretty interesting.

livrem · on Jan 11, 2023

Java has goto and const as reserved keywords. Source: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/_k...

kps · on Jan 11, 2023

C orignally had `entry` as a reserved work (from Fortran 77, where a function could have multiple entry points).

D-Coder · on Jan 11, 2023

It's pre-emptively reserving certain prefixes. "is" is for any function that returns true/false (which then collides with "island", "israel", "isaac"). "str" for any string function (which then collides with "strong", "strepthroat", "strange").

moloch-hai · on Jan 11, 2023

He fails to note that a pair of underscores appearing anywhere in a name is reserved, not just in leading position.

Also, a leading underscore followed by a capital letter, or other reserved name, is absolutely allowed in implementation headers -- even in non-standard headers. It is bad practice only because somebody else's compiler (e.g. Clang) might be obliged to read those headers someday, and have invented its own meaning for it.

mmoll · on Jan 11, 2023

Afaik, underscore-underscore anywhere in a name is reserved in C++ only. C only reserves names starting with underscore-underscore. And yes, you can get away with using these identifiers, but you could almost never be certain that you did.

moloch-hai · on Jan 11, 2023

Microsoft could be absolutely, utterly, undeniably certain it got away with using names reserved to the implementation in its implementation. Arguably some headers are not part of the implementation, because they don't come with MSVS. But that is a matter of packaging, which the Standard does not cover.

And, only C++ is of any interest, here. Microsoft never gave a damn about C, and any name reserved in C is also reserved in C++.

McKayDavis · on Jan 11, 2023

ninkendo · on Jan 11, 2023

Raymond Chen is worth his salary at Microsoft if the only thing he does is blog. His blog has been so consistently good, for so long, posting so often, it gives me imposter syndrome thinking about how he could possibly have a day job on top of it.

wheybags · on Jan 11, 2023

I wish Microsoft had more Raymond Chens, and less... whoever it is that makes decisions like "let's preinstall candy crush", and "let's break the start menu AGAIN". There's a real elegant core to windows, it's such a shame that a lot of the higher level stuff piled on top is so crappy.

rayiner · on Jan 11, 2023

NT is a nicely designed OS. I wonder how many people still at Microsoft really even grok it anymore. I hear Apple is having trouble finding enough developers who can do kernel work.

CoolGuySteve · on Jan 11, 2023

When I worked on video systems at Apple I got dinged on a performance review for negatively comparing their kernel to Linux too many times. I’m pretty sure they still have a dogshit IO scheduler but I haven’t used a Mac in ages.

I suspect it’s not that people can’t do the work and more that Apple’s silos mean their already employed engineers are not allowed to try or even talk about it.

coldpie · on Jan 11, 2023

> dinged on a performance review for negatively comparing their kernel... engineers are not allowed to try or even talk about it

I wonder if it was possibly not that you compared the kernels but rather how you chose to express the comparison. For example this bit makes me think you possibly expressed your opinion combatively:

> a dogshit IO scheduler

Perhaps you were more polite in your day-to-day, I don't know. But your language here makes me wonder.

CoolGuySteve · on Jan 11, 2023

You seem to be extrapolating a lot from a single turn of phrase but the fact remains that while Apple's kernel is satisfactory, it's not really considered best of bread in anything other than maybe dtrace support.

So I'm sorry I offended you with my salty language but the evidence suggests I'm far from the only one that went ignored at Apple.

burnished · on Jan 11, 2023

Some people seem unaware that others take issue with how they communicate and not what they communicate

the way you are responding to some one who asked you a pretty mild question (assuming they were personally offended rather than asking about a potential communication issue) suggests there is some validity here.

CoolGuySteve · on Jan 12, 2023

I don't think it was mild, they more or less implied I was an asshole at work because they didn't like a single adjective I used years after the fact.

Sure the phrasing was gentle, but the actual statement was quite offensive, presumptuous, and prudish.

coldpie · on Jan 11, 2023

> I'm sorry I offended you

You didn't.

BeetleB · on Jan 11, 2023

Just World Fallacy.

https://en.wikipedia.org/wiki/Just-world_hypothesis

coldpie · on Jan 11, 2023

More like, One Side of the Story :)

https://www.collinsdictionary.com/us/dictionary/english/some...

BeetleB · on Jan 11, 2023

To be fair, all anecdotes are one side of a story :-)

AndriyKunitsyn · on Jan 11, 2023

That's interesting, are there some benchmarks that prove the inferiority of the Mac scheduler? (Without getting you into an NDA trouble, of course.)

CoolGuySteve · on Jan 11, 2023

I don't think it matters as much any more now that we don't have spinning disks with slow seek times.

For things like Final Cut/iMovie with lots of video/audio/misc tracks, it was trivial to saturate the disk with dumb seeks when reading otherwise linear data streams due to a lack of knobs.

bitwize · on Jan 11, 2023

AFAIK Dave Cutler is still there but he's been moved off the Windows team -- first to Xbox of all places and then to cloud, maybe?

pjmlp · on Jan 11, 2023

Most well known names that are still around have moved either into Azure or DevDiv (which I think now is under Azure as well).

Which is probably the root cause of the GUI civil war happening between all desktop frameworks.

deadso · on Jan 11, 2023

"Been moved" strongly implies that it wasn't his choice. Cutler worked on some really cool Virtualization problems in Xbox and is bringing that expertise to Azure as well.

keltor · on Jan 11, 2023

There's lots of them but long long gone are the days when the guy who's working on Network blah blah for Azure is allowed to do anything with Windows. Silos are good except when they become Ivory Towers which is what they have become. Sadly it seems some sort of modern managerial style since it infects almost every corporation these days.

jjtheblunt · on Jan 11, 2023

I had this book when it first came out and it was fantastic, coming from a pretty hard core Unix background.

Inside Windows NT from Microsoft Press

https://a.co/d/1lxwnQt

chowells · on Jan 11, 2023

I don't think you understand the position Microsoft is in.

First, Candy Crush was never preinstalled. A link to purchase it in the Microsoft store was preinstalled, and it took all of two clicks to get rid of it if you wanted to.

Second, preinstalling that link reduced malware infections of windows systems by a visible percentage worldwide.

Microsoft has a duty to protect users who need it. Finding a compromise where the worst case for other users is that they need to click twice is pretty good.

sakras · on Jan 11, 2023

How did a candy crush link reduce malware infections?

Also as an aside, that link _kept coming back_ on my laptop (but not my desktop oddly).

Topgamer7 · on Jan 11, 2023

> Second, preinstalling that link reduced malware infections of windows systems by a visible percentage worldwide.

Why? Because people would pirate it? Or Just download virus laden games in general?

giaour · on Jan 11, 2023

If I wanted to set up a botnet, bundling malware in a dumb game and then putting the exe on the internet for free seems like a decent start

fomine3 · on Jan 12, 2023

Source? Why don't they also bundle Fortnite and LoL?

cjbgkagh · on Jan 11, 2023

Without that pile so many people would have nothing to do, no impact to list in their reasons for a bonus.

agumonkey · on Jan 11, 2023

He also strikes that great balance between fun to read, low level and precise. Thankful for this articles, always a reflex click :)

nikanj · on Jan 11, 2023

And every year or so Microsoft wrecks all links to his blog, so linking to old articles never works properly. It’s highly ironic that the blog talking about backwards compatibility moves urls constantly

throwaway9870 · on Jan 11, 2023

In the 90s he would answer Windows programming questions on usenet.

alex_suzuki · on Jan 11, 2023

Raymond Chen is the OG of Windows development.

actionfromafar · on Jan 11, 2023

To me that is Charles Petzold:

http://www.charlespetzold.com/pw5/

int_19h · on Jan 13, 2023

There was also Michael Kaplan, who had a similarly informative and in-depth blog "Sorting it all out" about localization, text encodings, keyboard layouts etc in Windows. In fact, Raymond Chen specifically recommended him on these topics:

https://devblogs.microsoft.com/oldnewthing/20041217-00/?p=36...

Unfortunately, Microsoft didn't exactly treat him kindly [1] in the end. And to add insult to injury, his blog was completely wiped. There are some archives around, but I haven't seen one that retained the images, which often makes the posts incomprehensible, unfortunately.

[1] https://vsubhash.wordpress.com/2017/04/17/rip-michael-j-kapl...

mc32 · on Jan 11, 2023

Him and Russinovich --before he joined MS.

richsu-ca · on Jan 11, 2023

"before" :)

ilyt · on Jan 11, 2023

Well, I'd imagine the day job is why the blog is so good.

shultays · on Jan 11, 2023

Also whenever I have a question, it feels like he has the answer. No matter how specific the question is