2. Trying to come up with the intended usage for these identifiers (`island` I imagine, might be a scope: only accessible to other identifiers on the same island? oh my...)
That's POSIX, not the C language standard. https://www.iso-9899.info/n1570.html#7.31 lists the library API names reserved for future use; section 7.1.3 in the same document states explicitly that they are reserved.
It's not that the word "island" has any particular significance; it's that all names starting with "is" followed by a lowercase letter are reserved (in external linkage and in the global scope of files that #include <ctype.h>), so that future versions of the language can add more standard library functions along the lines of isalpha, isdigit, etc., without making the current standards committee guess in advance which specific names following that pattern their successors might want to add in the future, and without breaking existing code. (Unless that existing code ignores these rules, which seems to be fairly common in practice.)
The C is of course a pirate's favorite language. It might end up with "water" and "land" types in the standard some day. So `bool iswater()` and `bool island()` functions might be needed! You might need to check if the island structure you got is really land, so `island(&possible_island)` would obviously help and not cause any confusion to readers of the code.
I saw someone point out that these identifier restrictions mean that a valid C compiler optimization would be to change all instances of "x = to[a-z]+(y);" to "if (!is[a-z]+(y)) x = to[a-z]+(y);" Not that any compiler would actually be absurd enough to implement this, of course :)
Naturally, this would imply the validity of the optimization:
They could just generate a random string, making it extremely unlikely that it will conflict with an existing name. And it won't be any less descriptive than most existing names in the C standard library.
Note that the original post is not up to date with the C23 developments on the subject, wherein the standards people appear to have noticed that nobody has ever cared about overbroad reservations like str*, is*, to*, and E*, and introduced a notion of “potentially reserved identifier” that I have so far been unable to understand. The relevant paper, “What we think we reserve”[1], has been folded into the draft standard.
Basically, the wording is so that they can advise implementations to complain on potentially invalid-in-the-future uses of "potentially reserved identifiers", but invalid uses of actually reserved identifiers are still "requires no diagnostics". They feel that allows them to extend the already way too vast amount of reserved identifiers even further without any reservations.
Yes, that means that an implementation will warn you that your code might break in the future, but on the day when it actually breaks, it will just stop complaining. Why they could not just mandate detecting invalid uses of reserved identifiers is beyond me.
On the page you linked, you can see that `ctype.h` reserves all prefixes of `is[a-z]` and `to[a-z]`, and `string.h` reserves `str[a-z]`. These come from the C standard (in C99, it's 7.26 "Future Library Directions"), though I don't think they use the word "reserved" there.
> Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.
He says where they come from -- section 7.31 of the C11 standard[1]. For example, in 7.31.12:
> Function names that begin with str and a lowercase letter may be added to the declarations in the <stdlib.h> header.
Or 7.31.13:
> Function names that begin with str, mem, or wcs and a lowercase letter may be added to the declarations in the <string.h> header.
So according to the standard, if your code #includes string.h, it may conflict with a future version of the standard if you use names like "strong" or "memorize". This is extraordinarily unlikely, though.
These strings are included in the regexes given just about that line as reserved for potential future by certain header files. "island" is covered by the ctype.h and wctype.h headers, which reserve every word starting with "is".
Then after looking into the C standard, POSIX further defines C identifiers for itself. That includes `_t` in type names! It's a rule so far from enforcement in reality that it is hard to take seriously.
It's "enforced" to the extent that future versions of the standard will feel free to define new types which end with _t without worrying about breaking existing code. If you defined your own uint32_t type and the new standard's uint32_t type stomps on it, well, you were (theoretically) warned.
The trailing _t is reserved, and I didn't know that for many years. I learned of it only after an interview candidate pointed it out as a gotcha in my question prompt. They got bonus points, and I learned something. I wonder why compilers or static analyzers don't complain about these things more often?
> I wonder why compilers or static analyzers don't complain about these things more often?
Because it's not the C compiler's business to care about POSIX.
POSIX and the C standard are different things. If you're writing code against the Windows APIs, POSIX isn't relevant for instance, and you need to be more concerned about colliding with type names or defines from the Windows API headers, and those are all over the place anyway.
...and besides: that rule is entirely pointless in reality, either your type names collide with POSIX types from headers you are including (in that case you're getting a compiler error anyway), or they don't collide, and in that case all is good.
There is one more case, for your list at the end. The names could start colliding when you get to add interaction with a posix system.
That is, most of the point of that was for some standards to give a "follow these rules, and you will have an easier time integrating with this standard if that is in your plans."
Obviously, if you have no plans to head to posix, they accomplish nothing for you. Similarly, not following the rules doesn't prevent that direction, just adds some extra work. Potentially.
> besides: that rule is entirely pointless in reality
I think it means: a library author cannot name some type epoch_t, because POSIX may introduce an epoch_t in the next revision, and suddenly some code may fail to compile.
The _t suffix is so ergonomic for typedefs this is one area of POSIX I simply ignore, because my types are usually prefixed with a namespace POSIX won't ever collide with anyways.
Effectively, every C library needs to do that, usually reserving some prefix.
It’s just a way to state “if you’re using this API/library, be prepared that future versions may introduce additional symbols matching these patterns, and/or that the current version may define undocumented symbols matching these patterns”. You then have the choice to take care to not define symbols conflicting with that.
The alternative would be for libraries to just add arbitrary new symbols, with no way for client code to proactively prevent conflicts.
Yes _t is very attractive to programmers as a suffix to indicate a type, and I've flagged in the past to people that they shouldn't technically do that - usually to be met with utter indifference
> Windows header files have historically not been conscientious about avoiding these reserved names. We’re trying to do better for new headers, but not everyone has gotten the memo.
Windows 2000 introduced the concept of application manifests, where executables get a mini XML file for application specific settings instead of storing them into the registry.
They also allow for registration free COM components.
To this day many Microsoft teams still haven't gotten the memo.
I'm not sure if they provided adequate tooling for the former. They definitely didn't provide adequate documentation for the latter.
Or maybe they did - back when MSDN was actually well-organized and somewhat complete-ish (and shipped on optical disks or otherwise downloadable!), I was too young to make much use of it. Now, that wealth of knowledge is mostly gone, and external links to it don't resolve.
I know it's a tradeoff but I think I fall on the side of I wish the language I was writing in required `this` or `self` for members/properties.
C++
foo = bar \* 2;
Are foo and bar local variables or members of some instance?
vs
Python
self.foo = self.bar \* 2
100% clear. No naming convention needed.
I bring this up because `_foo` for members is a naming convention that wouldn't be needed in a language that required `self` or `this`
That said, I get that maybe refactoring some code from standalone function to class method is easier if you don't have to change the code as much but I'd be curious how often that's a net win.
Might as well just always spell out the hidden "this" argument. Then all methods would look just like regular functions, allowing to simplify the language syntax and the standard, making it more concise and consistent, without sacrificing any functionality.
It really depends. I used PImpl today but it’s the rare exception. Between small inlinable classes and abstract base classes, there are lots of ways work. But that’s the beauty of C++: there are lots of ways to solve problems!
In C++ especially, respecting the rules on underscores is important because so much of stdlib is header-only, meaning lots of inline code that's exposed to shenanigans with identifiers used in it. For example, if you #define foo 123, and some inline function has a local named "foo", there's going to be a problem. This is why, when you look at standard C++ headers in most implementations, all identifiers, even locals, tend to be named _Like __this. And stdlib may also need to define its own internal macros (as there's a lot of repeated verbiage), which also follow this name pattern.
So if you disregard the fact that those are reserved, it might build, but even a minor update of your implementation that e.g. adds a new local to some obscure standard function can break it, never mind a new version of the C++ standard.
I mean, de facto, if Windows isn't good at it, at the very least compilers will have a "never mind ignore this" option if they even do start enforcing them. And while Raymond Chen knows Windows and can speak to it, I'd bet the Linux kernel has plenty of violations of this, and it's the same thing; no future C compilers are going to invalidate that no matter what some C committee somewhere says.
The issue is not developers, but disallowing the patterns is[a-z].*, to[a-z].*, str[a-z].* and others. Why would anyone do this to a programming language?
It's a stupid decision to reserve those and it ruins the usability of the language by arbitrarily disallowing many common words that have nothing to do with the intended feature. Even if you don't export names like this directly, having the name of your library or company (as prefix for all exported names) start with those common combinations is likely as well, so now you got to avoid certain library or company names.
If this is only since C11, then I'm even more baffled, since I could somewhat see how such thing could happen in the 1970s when there were limits and the language was brand new and not known to be popular, but in 2011 doing this makes no sense whatsoever.
I hope this is limited to C11 and will never happen in C++.
C doesn't even need namespaces for this, a simple stdc_ prefix for stdlib functions and types would also work (see C23 stdbit.h, I hope this won't be an outlier).
Yes, but we can't simply port C++ namespaces over because this requires symbol name mangling. C could be using 'namespace prefixes' instead with something like 'usingprefix stdc_', this wouldn't require changes to the symbol format, and would provide most of the features of C++ namespaces.
What else should they reserve in order to be able to introduce new identifiers? The alternative would be that new functions would have to look like __std_c__strfoo().
-use more obscure letter combinations than "is", "to" and "str" which are very common beginnings of words. How do "is", "to" and "str" help anyway if they want to add functionality that has no sensible name starting with those?
-use two underscores at the beginning since as per the article that is already reserved
Or introduce them as __newfunction for the linker, but have a new optional header file that #defines the short name. That’s the route that was chosen for C99’s _Bool/bool (there it was a typedef), and it worked well AFAIK.
The drawback of the macro solution is that it also disables those names for static symbols and struct members etc. when the header is included. For bool this was okay as it was intended as an opt-in keyword and the header was a new one, but doing that for all functions and in existing headers would potentially break more code.
TBH it would make a lot more sense if the C committee would start adding prefixes for new additions to the C stdlib (like the functions in C23 stdbit.h: stdc_popcount, stdc_bit_width, etc...)
> It's more like "disregard them at your own risk"
I'd argue that a compiler update that breaks existing code for something as innocuous as using the "wrong" variable name is a compiler bug, even if the code technically violates the spec. It is simply too late now for the spec to begin using the keywords they reserved 40 years ago.
In truth they've been careful about introducing names and not breaking existing code. For example, when they finally did a bool type, they called it _Bool to prevent incompatibilities with nonstandard bool types. You needed to include <stdbool.h> to get a friendly, usable name of "bool".
Within reason, yeah. Like if they chose to add a new "island" function and my geographics software broke, that's clearly the spec's fault, not mine, even though technically I'm violating spec. I'd even argue any "is.*" function is probably too late to add to the spec at this point in time.
> I'd even argue any "is.*" function is probably too late to add to the spec at this point in time.
I'd like to see isualpha() isudigit() isupunct() isuspace() isugraph() etc. plus isucombining(). They'd supplement iswxxx() but (a) taking a `signed unicode int` (for C2x defined as at least 32 bits wide) and (b) being locale-independent.
It only checks for the ones starting with underscore.
> This check does not (yet) check for other reserved names, e.g. macro names identical to language keywords, and names specifically reserved by language standards, e.g. C++ ‘zombie names’ and C future library directions.
really, avoid underscores altogether - they are very horrible. even people that know that there are exclusion rules for them don't really know what the rules are. and if you get things wrong, the error messages your compiler gives you will be incomprehensible. so, just don't use them - why would you?
I've never seen leading underscores in classes. I've seen trailing underscores, and the old m_ prefix, but never a plain _.
I was surprised by C++11 having additional prefixes it has reserved. I'm now also wondering whether there is a clang/gcc option to warn about such things, as although I know we don't currently have any issues in our code base (as in, it compiles and works) I don't really want to publish a public API and have to revisit it because of such a conflict in C++29 or whatever
Leading underscores help you disambiguate `_name` the field from `name` the member-function. Similar thing in Python. How else do you solve this without it being even more confusing? Is `m_` really preferable? (Honest question.)
As offered by the comment you responded to: trailing underscore. And yes: since leading underscore is reserved, and something needed to be reserved for reasonably-good reasons, m_ is preferable to _ if some reason you simply refuse to use a trailing modifier.
> if some reason you simply refuse to use a trailing modifier
I'm not going to make a big deal about it, but it makes more sense to me to put the scope of a variable at the front. The front is where you put foo. and foo-> and foo[], after all.
Leading underscore is only reserved if the following letter is uppercase. And it's fairly common for C++ code to have coding style that mandates that fields start with a lowercase letter anyway.
We use `m_` at work and I've come to really appreciate it. It makes reading code very easy - any variable you see prefixed with `m_` is a class member field. Anything else is either a function argument, a locally scoped variable, or has some other prefix (`k`, for example, referring to static constants).
Since we read code far more than we write it, sprinkling little "usage hints" like this across symbol names removes a lot more cognitive overhead than I would have thought.
I don‘t think that using m_ or s_ as prefixes is bad, but I think it should be the IDE‘s job to provide visual hints (color, bold, italic, etc.) with respect to the identifiers storage.
some underscores have been reserved in certain situations in user code from ansi c and onwards - there is simply no need to use them. trailing underscores will work but why bother? they are difficult to read and difficult to type.
Trailing underscores are my go to for resolving conflicts with keywords. It's a much more consistent and trivial to remember rule than random misspellings or synonyms of keywords (casts side-eye to `klazz`).
So C did pre-emptively reserve certain words? Like island, strong, together..
I wonder if that paid off in some cases?
Or to ask differently, if one would design a new programming language (now), would you consider reserving words in advance?
I'm currently making a language, and yes, I'm reserving words for it because I want an easy C ABI. But I'm also making them easier to avoid.
Here's my list of reserved words:
* Anything that begins with `y_`.
* Anything that begins with `yc_`.
* Anything that begins with `YC_`.
* Anything with three or more consecutive underscores.
* Edit: Anything that begins with an underscore. This is because my language will be able to transpile to C if necessary.
The first is for types and items in the standard library (the language's name is Yao, so a `y` makes sense). The second and third are for the C ABI (hence, `yc`) and for historical reasons. `YC_` in particular is for C macros.
The last is for name "mangling." I put it in quotes because my language's standard name mangling (it will be the same across every implementation) will not really mangle the names. Instead, it will concatenate them, using five underscores between package names, four underscores between packages and items in a package, and three underscores between an item and its suffix. (For overloaded functions, the programmer has to define a suffix for each one. That suffix is how their names will be different in the C ABI.)
My hope is that these rules will not be too onerous. I don't think the reserved prefixes are used much, and I haven't seen anyone use more than one consecutive underscore, though I'm allowing one more, just in case.
It's unfortunate POSIX didn't adopt a similar convention for its reservations, like a psx_ suffix for its namespace. But I guess with POSIX it was more ratifying established *NIX things as a standard they could all easily agree on without too much disruption.
‘y_’ is too strict in my opinion. You declare lots of variables like y_1 or y_ans when you write any sort of numerical calculation code (like in games or physics simulations).
It seems too strict, but I should clarify: these rules only apply to the C ABI.
Declaring variables with those names actually will not conflict if they are done in pure Yao. There's a special way to access the C symbols of functions and types, and it's deliberately different, for this very purpose.
Ah if that’s only for the C ABI then that’s much better. I still see someone writing numerical algorithms might use y_* as one of the arguments, though the probability is a lot more slim.
The restriction isn't on `y[c]_`. It's actually on `y[c]___`.
This is because `y` is the package name for the standard library, and the standard library name will always be separated from the rest of the name by 3 or more underscores.
Really, the restriction is on 3 or more consecutive underscores in C names, and you can't have have a package name that is either `y`, `yc`, or `YC`.
I apologize for the confusion. I was in a hurry and on mobile with the original post.
New languages usually support namespaces, which prevents the problem.
As for payoff, new versions of the C standard usually introduce new functions and macros with those name patterns, without breaking client code that respects the reserved-name rules.
I wouldn't hesitate to use island, strong, or together as a variable name, even though they are technically "reserved". Maybe avoid "strong", it's on the edge of being made into a reserved word. I'd stay away from "isnull", "strsplit", or "toint" or similar variable names though. Although "isNull", "strSplit", and "toInt" are still fine.
I think the idea of reserving keywords in advance is a good one, but not something that interferes with vocabulary words so easily. The idea is to be able to add a new keyword in the future without breaking code that uses that keyword itself currently.
The ES5 case is interesting because most of that list of "future reserved words" was a list of reserved words in ES4 "The Lost Version". That list also included fun things still not used like private, public, abstract, package, byte, int, volatile, synchronized.
One additional fun thing to note is that despite "private" already being reserved as a keyword TC-39 when they did somewhat recently add private fields to classes a few years back decided on "hash names" over the private keyword.
It's pre-emptively reserving certain prefixes. "is" is for any function that returns true/false (which then collides with "island", "israel", "isaac"). "str" for any string function (which then collides with "strong", "strepthroat", "strange").
He fails to note that a pair of underscores appearing anywhere in a name is reserved, not just in leading position.
Also, a leading underscore followed by a capital letter, or other reserved name, is absolutely allowed in implementation headers -- even in non-standard headers. It is bad practice only because somebody else's compiler (e.g. Clang) might be obliged to read those headers someday, and have invented its own meaning for it.
Afaik, underscore-underscore anywhere in a name is reserved in C++ only. C only reserves names starting with underscore-underscore. And yes, you can get away with using these identifiers, but you could almost never be certain that you did.
Microsoft could be absolutely, utterly, undeniably certain it got away with using names reserved to the implementation in its implementation. Arguably some headers are not part of the implementation, because they don't come with MSVS. But that is a matter of packaging, which the Standard does not cover.
And, only C++ is of any interest, here. Microsoft never gave a damn about C, and any name reserved in C is also reserved in C++.
Raymond Chen is worth his salary at Microsoft if the only thing he does is blog. His blog has been so consistently good, for so long, posting so often, it gives me imposter syndrome thinking about how he could possibly have a day job on top of it.
I wish Microsoft had more Raymond Chens, and less... whoever it is that makes decisions like "let's preinstall candy crush", and "let's break the start menu AGAIN". There's a real elegant core to windows, it's such a shame that a lot of the higher level stuff piled on top is so crappy.
NT is a nicely designed OS. I wonder how many people still at Microsoft really even grok it anymore. I hear Apple is having trouble finding enough developers who can do kernel work.
When I worked on video systems at Apple I got dinged on a performance review for negatively comparing their kernel to Linux too many times. I’m pretty sure they still have a dogshit IO scheduler but I haven’t used a Mac in ages.
I suspect it’s not that people can’t do the work and more that Apple’s silos mean their already employed engineers are not allowed to try or even talk about it.
> dinged on a performance review for negatively comparing their kernel... engineers are not allowed to try or even talk about it
I wonder if it was possibly not that you compared the kernels but rather how you chose to express the comparison. For example this bit makes me think you possibly expressed your opinion combatively:
> a dogshit IO scheduler
Perhaps you were more polite in your day-to-day, I don't know. But your language here makes me wonder.
You seem to be extrapolating a lot from a single turn of phrase but the fact remains that while Apple's kernel is satisfactory, it's not really considered best of bread in anything other than maybe dtrace support.
So I'm sorry I offended you with my salty language but the evidence suggests I'm far from the only one that went ignored at Apple.
Some people seem unaware that others take issue with how they communicate and not what they communicate
the way you are responding to some one who asked you a pretty mild question (assuming they were personally offended rather than asking about a potential communication issue) suggests there is some validity here.
I don't think it matters as much any more now that we don't have spinning disks with slow seek times.
For things like Final Cut/iMovie with lots of video/audio/misc tracks, it was trivial to saturate the disk with dumb seeks when reading otherwise linear data streams due to a lack of knobs.
"Been moved" strongly implies that it wasn't his choice. Cutler worked on some really cool Virtualization problems in Xbox and is bringing that expertise to Azure as well.
There's lots of them but long long gone are the days when the guy who's working on Network blah blah for Azure is allowed to do anything with Windows. Silos are good except when they become Ivory Towers which is what they have become. Sadly it seems some sort of modern managerial style since it infects almost every corporation these days.
I don't think you understand the position Microsoft is in.
First, Candy Crush was never preinstalled. A link to purchase it in the Microsoft store was preinstalled, and it took all of two clicks to get rid of it if you wanted to.
Second, preinstalling that link reduced malware infections of windows systems by a visible percentage worldwide.
Microsoft has a duty to protect users who need it. Finding a compromise where the worst case for other users is that they need to click twice is pretty good.
And every year or so Microsoft wrecks all links to his blog, so linking to old articles never works properly. It’s highly ironic that the blog talking about backwards compatibility moves urls constantly
There was also Michael Kaplan, who had a similarly informative and in-depth blog "Sorting it all out" about localization, text encodings, keyboard layouts etc in Windows. In fact, Raymond Chen specifically recommended him on these topics:
Unfortunately, Microsoft didn't exactly treat him kindly [1] in the end. And to add insult to injury, his blog was completely wiped. There are some archives around, but I haven't seen one that retained the images, which often makes the posts incomprehensible, unfortunately.
OK, I know it's only an offhand remark in a blog post, but now I'm going to have to spend significant energy on:
1. Finding where these identifiers are reserved, exactly, because obvious sources like https://pubs.opengroup.org/onlinepubs/9699919799/functions/V... don't seem to include them?
2. Trying to come up with the intended usage for these identifiers (`island` I imagine, might be a scope: only accessible to other identifiers on the same island? oh my...)