Maybe I'm being dense, but I fail to see the point. The code has the exact same sequence of tokens in all cases, just with different names for the constant. The fact that a Ctrl+F doesn't find all related uses seems to be the only argument and it doesn't convince me.
The article does not provide an alternative solution, or a discussion of the upsides of different naming conventions (e.g. knowing that ALL_CAPS is almost surely a compile-time constant). "How many man years are wasted..." - compared to what?
I think he is complaining about not being able to copy paste GL_MAX_TEXTURE_SIZE to limits.GL_MAX_TEXTURE_SIZE? Otherwise I am failing to see the point as well
The title seems to imply that this problem does not exist in languages not using Latin scripts. That's not true. Take Japanese for an example, when you have hiragana, katakana, and kanji to express the same idea (and not to mention a kanji typically have more than one way of pronouncing it), the problem is gonna be orders of magnitude more complex.
> Take Japanese for an example, when you have hiragana, katakana, and kanji to express the same idea
I'm sure you know this, but for readers who aren't familiar with Japanese, it might be worth saying that hiragana, katakana and kanji aren't strictly interchangeable. Sure you can use hiragana and katakana as a fallback when you cannot or don't want to use the kanji for various reasons, but normally in every situation there's a recommended script to use in order to write correct Japanese.
I think the idea is to think about what would have happened if programming started in another language. If it were Japanese, it’s easy to imagine that you’d have the same issue with different styles in different contexts.
Some contexts might call for all hiragana variables, some all katakana, and some using the most likely/appropriate form of the word, including kanji. And in the third case, you’re still going to end up with discrepancies since not everyone agrees on when to use kanji.
And don't forget script number four in Japanese: Latin letters! Used for all sorts of odds and ends not covered by the other scripts; mostly (abbreviations of) names like 「IBM」. These conveniently come in full-width versions for use with kanji, hiragana, and katakana, but this of course won't stop anyone from using the basic letters we're using here. So in the hypothetical case where Unicode is a thing, but the lingua franca of software engineering is Japanese instead of English, you might end up with variables named:
変数
へんすう
ヘンスウ (or even ヘンスウ)
variable
VARIABLE
variable
VARIABLE
…and all the additional snakes and camels varying the notations further.
> Someone had to translate GL_MAX_TEXTURE_SIZE into mMaxTextureSize.
I fail to see how the fact that the Latin alphabet has upper and lower case is to blame for this. This is more about the coding standards the author has to abide to than anything else.
If your coding standard would require you to name the property MMAXTEXTURESIZE instead of mMaxTextureSize, would there be any less busy work?
even if his coding standard allows GL_MAX_TEXTURE_SIZE, I still don't see the point. One (original) GL_MAX_TEXTURE_SIZE is a global constant and his limits.GL_MAX_TEXTURE_SIZE is just a member. You will still need a "conversion" code for every constant there is.
Notice all this busy work
And the code that comes after that is same amount of work, except typing mMaxTextureSize instead of copy pasting GL_MAX_TEXTURE_SIZE perhaps. Is he really complaining about this?
Also is author pushing for everyone should follow one standard? Hard to say what he is asking for.
I mean, they're already using Hungarian notation wrong[1], if you want painless translations use a language that allows for reflection and use smarter naming conventions
Japanese has two syllabaries, hiragana and katakana. They represent the same sets of sounds. I think this is very similar to uppercase and lowercase. Of course, the conventions around their usage are different.
If I write a sentence in ALL CAPS, I’M YELLING AT YOU. if i write a sentence in lowercase i am maybe laid back or aloof.
A sentence written in all hiragana might represent a toddler speaking. A sentence written in all katakana might represent a foreigner speaking broken Japanese (ouch!).
Usually, hiragana is used for grammatical purposes, such as particles and conjugations. But for some reason, laws and contracts use katakana instead. In a hypothetical Japanese programming language, which one should they pick? In university I had to learn this strange language called Doolittle[1] which uses Japanese but dodges the grammar particle issue by using spaces and symbols, which I found to be terribly confusing. I would love to see an attempt at a Japanese programming language that is closer to how the spoken language works.
This only scratches the surface. There are even more ways to represent arbitrary syllables, such as with arbitrary kanji (ateji) or even more complex systems like Kanbun. And of course there are many levels of simplification and variation within Chinese characters.
My point is that I don’t think this is inherently a Western thing, even if it ended up that way. It’s interesting to ponder how a non-Western language’s programming conventions could look like. Sometimes it’s even a real issue: see languages like Go where uppercase is semantically meaningful and therefore it is impossible to export Eastern symbols.
The different casing conventions is a signal indicating what kind of value the identifier represents (macro, local variable, type etc), it's actually doing work and isn't useless embellishment. If case didn't exist to signal those distinctions, other means would have been used (like the m_-prefix for members in some C++ styles). So the existence of upper case and lower case certainly isn't to blame, programmers would be free to use just one if it sufficed for them (and there was a time when a lot of programs consisted of upper case only).
If we programmed in Chinese, one would need to know thousands of characters to write any decent-sized program; if we programmed in Japanese, one would need to master several alphabets, if we were to program in Hindi, the very shape of our characters would depend of the context...
All in all, the upper/lower case problem does not seem to be that much of a waste of man-years.
> If we programmed in Chinese, one would need to know thousands of characters to write any decent-sized program; if we programmed in Japanese, one would need to master several alphabets
Right, because that's the difficult part. Memorizing 100 alphabets is significantly easier than memorizing the relevant Chinese characters. The syllabaries are zero percent of the difficulty of learning to read Japanese.
>if we were to program in Hindi, the very shape of our characters would depend of the context...
Eh? What context-dependent-shape-changing are you referring to? Hindi has a very normal alphabet. There are vowels, consonants, and a bunch of special ligatures for some combinations of those that are technically optional and could be done without. Nothing changes according to "context".
Okay, guess we disagree about what "context" means.
You can't write adam, you have to write Adam. This is because the letters don't carry enough information; you need to know the context that names are capitalized, which is information that is extrinsic to the text itself.
Similarly, you can't render a unified Han glyph without knowing the context of which language the text is in. This information is again extrinsic to the text itself - it is metadata about the text.
In comparison, when you can't write Hindi ru as r and u as you usually would but instead you have to write the letter ru as its own glyph, there is no extrinsic information needed. The letters are right there, so all the information for knowing how to render the letters is right there. There is no context needed.
But yes, if you define context to include surrounding letters, then yes you need "context" to render Hindi letters. But since the original context of this was the difficulty of supporting this context for programming identifiers, I'd say that this kind of "adjacent codepoints context" would not be something special to worry about anyway, because text renderers today already have to deal with this kind of "adjacent codepoints context" for emojis, etc.
Especially considering that English ASCII fits into 128 characters including punctuation ... I do not want to start thinking how Chinese or Japanese fits into that.
In reality the ask is maybe: let us use 25 characters english without uppercase? Which we have in Basic and dozens of old style programming language however typically upper case.
In a universe where Chinese or Japanese were the prominent languages for program identifiers, there wouldn't be a requirement to fit into 128 characters in the first place. The dominant charset would probably be GB2312 / Shift-JIS which are multi-byte.
What exactly is the reason people started to have upper and lower case letters? I found a few articles about this appearing in the middle ages, but there doesn't seem to be a reason that makes sense. Seems to add nothing to the language, wtf is the point of the first letter being a capital anyway, and rules about various words that have to have caps?
Like many things that evolved slowly over time, there isn’t a single clear reason for the upper/lower case distinction.
The distinction began as a mixing of different writing styles (called “hands” in calligraphy), something similar to how we today will mix fonts in a document.
“Lower case” was the newer, more common font. (What wed today call uncial or Carolingian)
“Upper case” was a font based on older, Roman-era designs. This is why Roman monuments like the Trajan column appear to us as being in all capitals.
Using the older “font” at the beginning of sentences seems to have begun as stylistic choice, but perhaps with some purpose as a reading aid to identify sentence starts, somewhat in the same way we’ll use different fonts for headings. Note that punctuation (periods, question marks, commas, etc) was also evolving at around the same time and didn’t exist in Roman times.
Bold and italic also evolved from the mixing of different “fonts” within one text.
Wish I could find a better source, but this is what I’ve learned as a calligrapher over the years.
As the other comment says: readability. ALGOL-68 (infamously) allows spaces in identifiers, which makes the code look a bit more natural. C and it's descendants don't, so a logical recourse is using underscores, or --if that's too much typing-- switching capitalization. Pascal was at the forefront of this convention (weirdly enough because the language is case insensitive, I think).
Other conventions were added: e.g., a capital for a type name, m as a prefix for class members, and of course a leading _ as an indication of not being public (which comes from the unix linker/, IIRC).
Anyway, I think the answer to OP's question is a negative number.
Legibility. It is a significant improvement, and it still bothers me that English underuses it, even after years of it being my primary reading language.
I'd dread to think of what naming conventions would look like if they were modeled after Japanese. You can write the same exact word as ひとごみ, 人ごみ, 人込み, 人混み, 人込, or even ヒトゴミ if you're feeling bold.
On the other hand, maybe we'd be forced to actually confront those equivalences and make languages that accommodate them.
Research has shown that the lowest hanging fruit in programming language design would be to make names case-insensitive - allowing you to have two variables that differ only in case causes far more problems than it solves. But, programming being a pop culture, no-one dares.
This article is a load of bollocks. If another language would be the main programming one, there would be other competing conventions resulting in similar problems.
If this is really a major pain point in some project, there are probably solutions - macros, reflection, code generation, custom build steps and probably more. Or just change your naming convention.
You need to define the _translation_ from conventions in a way that does not introduce issues. Not just the one convention but how to get to the other and back.
HASHTABLE to Hashtable HASH_TABLE to HashTable.
That's why we have conventions: to avoid wasting time arguing about which style to use here and there. I wasted more time reading this article than I will ever waste with lower/upper case naming issues.
By the way, Arabic has 4 letter forms, Japanese has 3 sets of characters, Chinese has 2 character forms, Vietnamese and some other romanized Asian languages have many diacritics (for the tones) which change the meaning of words,... Latin alphabet as it is used in English is by far the simplest of all (to use, teach or learn).
a lot of swes in China use pinyin (which is a romanization system for mandarin Chinese) for naming variables/func/etc
basic vocabulary such as time/test/thread/doc/… might be used along with Pinyin as well, so you will see things like dingdan_time, testJieguo, as a Chinese myself, I found this really hard to understand without a broader context since Chinese is a tonal language..
It's not a problem of the Western language, it's a problem of conventions made to preserve context and assuming the worst, along with some cult stuff.
You could avoid stuff like get/set, but that's pretty low-hanging fruit compared to the explosion of things when you have 3 components referencing from top to bottom, each with a 8+ character variable. Preserving context is expensive in writing and reading. Assuming the reader has a shared context is required to shorten many things, however. This can backfire and make things worse than writing things fully.
Example, you could shorten mMaxTextureSize to maxSize or maxTexture or something. It entirely depends on what the reader knows coming in and what other variables exist or may exist in the future. In the other example, "getCurrentContext()->getLimits()" could be shortened to "Context()->Limits()" assuming there is only context property available is the current one. In both examples, I have to make assumptions on what the reader knows and both the existing and future code which, as mentioned before, can backfire completely.
Programming is for humans, that's why we have programming languages instead of writing machine code... Humans have different concept of writing. I think that's perfectly natural to have conventions on style of writing.
Koreans use sillabic bigrams, in the west we use mostly letters in a handful of variation per country and two main alphabet (how many remember that in Europe we have various letters not only in Cyrillic and it's national variations? Like þ, Ð, ȝ, ...), Japanese have even three alphabets (hiragana, katakana and kanji) often mixed for kanji reading aids, ... we have invented the concept of International Auxiliary Languages, with their alphabets, the most well-known is Esperanto but they never took off because pushing their own language give advantage to the successful push-er and so a war at a time we see winners and looser...
This is so tabs vs spaces holywar. If you can type rather fast you can type any variable without autocomplete no matter how long. If I understood correctly, man years are wasted for more typing only.
The way Rust solves this inconsistency issue is by having a bunch of builtin style lints, that 95% of Rust code follows. The key word here is builtin, as in, they existed since 1.0 and are enabled in the compiler by default. You still have to deal with different naming conventions, but they only exist when you interface with some existing component/API.
I was really hoping this was going to turn into a rant about visibility in Go.
As much as I like Go, I really dislike the use of upper case to make things public. About the only thing I miss about Java is the convention of using capitalised words for types and lowercase for fields, methods and variables.
I must hit a name conflict in Go every second week.
Isn't the real problem here that people unfamiliar with the Latin alphabet will have to learn the Latin alphabet?
I suppose they could design programming languages using the Chinese alphabet (maybe they already have? I wouldn't know) that use different conventions[0].
Chinese kids already learn the Latin alphabet, even before they learn how to write Chinese characters.
The Latin alphabet is used as a stepping stone towards learning how to write, and as an input method for computers and phones (you type the pronunciation of the character you want to write, and then select from a list of characters that are homophones).
My answer: not as many by far as are/were wasted with various character encoding standards and the problems involved in converting between them. Fortunately Unicode is slowly making that a thing of the past, but brings with it new issues (combining characters anyone?)
As others said, Japanese isn't remotely that simple but also many non-Western languages use Latin script. Anyway, hard to take someone complaining about notation seriously when they're simultaneously fine with a syntax as complicated and nonsensical as C++.
What is missing is some evidence (by which I mean ANY shred of evidence, not necessarily a full-scale research; could be circumstantial) that coders whose native languages lack the concept of upper/lower case ACTUALLY struggle more with this aspect of programming.
For what it's worth, we've had quite a few languages that only used upper case. Off the top of my head, FORTRAN or COBOL, but I believe it was pretty much the standard up until a point. Was it beneficial? Was there a detectable drop in productivity among programmers of Japanese/Korean/... origin once this trend reverted, and case-sensitive languages took over?
The article offers some food for thought, but objectively it does a better job riding the wave of current intellectual climate (inadvertently, perhaps) than producing a convincing argument in favor of the hypothesis.
Maybe I'm misinterpreting this but the way I read it this guy is complaining about type checking and then calling it racist? It sounds to me like what the zoomers would call a "game dev moment."
The article does not provide an alternative solution, or a discussion of the upsides of different naming conventions (e.g. knowing that ALL_CAPS is almost surely a compile-time constant). "How many man years are wasted..." - compared to what?