As someone who has worked on Windows for a long time, the title was entirely unsurprising. Widening or narrowing always uses the current codepage.
If a name contains values beyond ASCII — technically out of spec
I'm not sure what spec it's referring to, but this is normal and expected for files in non-English systems.
Such tools often incorrectly assume UTF-8, which is what motivated this article.
Those tools are likely to be from the *nix world, where UTF-8 is far more common for the multibyte encoding --- but even there, you can have different codepages; and I have worked on Linux systems using CP1252 and 932 before.
> I'm not sure what spec [the prohibition on non-ASCII names is] referring to, but this is normal and expected for files in non-English systems.
The description of import directories[1,2] in the PE/COFF spec explicitly (if somewhat glibly) restricts imported DLLs to being referenced using ASCII only:
> Name RVA - The address of an ASCII string that contains the name of the DLL.
That is, for lack of a better term, a "Microsoft-ism"; in MS documentation, "ASCII" consistently means "single-byte or MBCS" and is interpreted relative to the current codepage, as opposed to "Unicode" which means "UCS-2 or UTF-16". You can also see examples of "Unicode" in the docs you link to.
> That is, for lack of a better term, a "Microsoft-ism"; in MS documentation, "ASCII" consistently means "single-byte or MBCS" and is interpreted relative to the current codepage
I've read my share of MS docs and I do not recall ever seeing this to be the case. Like the parent says, I've seen "ANSI" used to refer to that, not ASCII. Do you have any examples of where they say "ASCII" where the intention is obviously something broader than 0-127? It makes me wonder how I've missed this if that's the case.
Not sure if I've ever seen a Microsoft doc do it, but many other places including articles be MS "MVP"s use ASCII and ANSI interchangeably.
In MS output in my experience consistently means standard 7-bit ASCII.
Things they routinely do oddly are using ANSI to specifically refer to the WIN1252 code page (a superset of ISO8859-1 otherwise referred to as CP1252) when the institute of that name did not define nor dictate user of the codepage, and including (or requiring for correct interpretation) the BOM sequence in UTF-8 encodings when the standard allows recommends against a BOM in this context.
It's the import table of an EXE/DLL where non-ascii is out of spec. Meanwhile LoadLibraryW is happy to load any filename. (But don't you dare try to call LoadLibraryW from within DLLMain, that's under loader lock)
Sadly, it's not unusual at all to see a Windows app crash and burn when paths contain non-ASCII characters. It's just what it is to non-English computer users.
I've couldn't build some files in Android Studio on windows, because path to gradle contained non-english characters. It's a common problem for a lot of linux-based tools. Sometimes even spaces in directory names are enough to break those tools.
oh yeah. You have to re-create user account with ASCII username if that happens.
And clearly those things are why there is C:\ProgramData\(that's under 8+3 characters!) since late XP era - even "C:\Program\ Files\" must have been too much sometimes, and having that folder is a useful harm reduction; developer response to app not working under Program Files is to make random top level directories under C:\, not taking time to clear dech tebts.
More interestingly, you can use this trick to create code where some user-specified word appears as a string and as the name of the function. Exercice: write a macro M(x) such that compiling the code
It turns a macro argument into a string of that argument. Let's say you have a mydebug(expression) macro that should print both "expression" and the value of expression.
I got the chance to evaluate vendors for a huge enterprise because I was assisting their CTO. I vividly remember the sales guy who flew from Redmond to pitch the shiny new Hyper-V virtual machine platform Microsoft had just developed to compete head-to-head with VMware.
“I tried the beta and it couldn’t install successfully if I set my regional options to en-AU.”
“Umm… that’s just a cosmetic issue.”
“It’s a hypervisor kernel, it is going to host tens of thousands of our most critical applications and it crashes if I change one of only three things it asks during setup. My confidence is not super high right now.”
Etc…
I got the impression that Microsoft is used to selling to PHBs based on the look of shock on the guy’s face when I told him that I not only installed the product, but benchmarked it too for good measure.
I absolutely hate that they've reduced it to a single "regional settings". Just because I don't want Norwegian text everywhere does not mean I want dates and time to be displayed in some weird way. However I also utterly despise the Norwegian official way of writing decimal numbers with , rather than . as the decimal separator.
We've had fine-grained control over this for ages, apps can handle it fine, just let us get it the way we want it.
"English (Ireland)" seems to be the closest thing to a sane locale out of the box (could have wished for ISO-8601 dates, but I guess you can't have everything).
That reminds me of the spotlight bar on a Mac that could also do maths, except that if you used . instead of the regionally correct decimal separator ",", it would ignore it, so e.g. 123.7 + 4.50 became 1687.
It is on Windows 10 which I run at home. On Windows 11 which I use at work they changed it to the way you say.
But even then it didn't work to simply have English locale and Norwegian and US keyboard layouts under it. I can't recall what they messed up right now, but I fought Windows 11 for quite some time.
Finally settled on English locale with US keyboard layout, and Norwegian locale with Norwegian keyboard layout, which mostly works in terms of keyboard layout, but now my Weather app is showing Fahrenheit instead of Celsius, despite my regional settings being Norwegian.
Like, how hard can it be to just not fuck it up? They had it working fine for decades!
The shortcuts next to the menu items in LibreOffice apps show me French translations, e.g. "Ctrl+Maj+Espace" for "Ctrl+Shift+Space" even though my language is set to Dutch everywhere (and other texts in LibreOffice are properly translated to Dutch). Apparently it has to do with me using Azerty keyboard layout. I use Belgian Azerty, not French Azerty. Yes, it's confusing.
“I tried the beta and it couldn’t complete the installer if I set my regional options to en-AU.”
“Umm… that’s just a cosmetic issue.”
“It’s a hypervisor kernel, it is going to host tens of thousands of our most critical applications and it crashes if I change one of only three things it asks during setup. My confidence is not super high right now.”
No offense, but to me, the way it written, it shines bad light rather on you. Obviously rep wouldn't answer you something like:
"Well, it said it is beta, didn't it? The quality of the
installer of a BETA hasn't anything to do with the quality of hypervisor itself. "
It was a month from release and it’s a product that’s “even more critical than the OS kernel” for reliability and availability. A failed hypervisor can take out dozens of servers at once.
I also managed to crash or lock it up several times, I just mentioned the keyboard thing as an insane bug. What possible dependency could a stripped down kernel with hardly any user space have on a keyboard layout that’s identical! It is different from en-US in name only.
It’s not about the specifics of the issue, but about the overall impression of sloppiness. They didn’t make a hypervisor that’s purpose-designed for the requirements, they just stripped down Windows and deleted stuff haphazardly so that they were missing the keyboard but still had the installer option.
For reference, I did run it at scale a few years later and my misgivings were confirmed… and then some. It was much less stable than ESXi and the cluster operations were a disaster. Read only operations could cause deadlocks that only a full cluster reboot could resolve. In-place upgrades weren’t available for several major versions! Meanwhile ESXi clusters could be live-upgraded including disk format changes!
After enough decades of experience you get a sixth sense for these things. A single sentence or just one word can trigger an alarm bell in your brain.
With a beta, I expect it to at least be somewhat tested. If they didn't test with anything but the defaults in the installer, I wouldn't be particularly confident about the product either.
Is this what is meant when people say that Windows doesn't have stable APIs? I've not programmed windows/desktop software and have often been confused as to why if you're writing an assembly program, you have to use at least a little C if you want to make Windows system calls[1]. The compiler must be compiling the call into "something" so can't you just write that "something" directly in assembly?
[1]: The classic example being Chris Sawyer writing nearly all of Rollercoaster Tycoon in x86 assembly but requiring just enough C for the system calls.
Unlike Linux, Windows syscalls aren't documented and their IDs constantly change[1]. Instead, you're supposed to call wrapper functions provided by ntdll.dll. That said, most programs use even higher level functions from kernel32.dll and friends.
You don't need C/C++ to call a function from a DLL, but it makes things easier, especially for more complex APIs like DirectX.
> have often been confused as to why if you're writing an assembly program, you have to use at least a little C if you want to make Windows system calls
This isn’t true. The Win32 API is C-based. You can call a C-based API from hand-written assembly just fine. You just have to understand the ABI calling convention so you know how to do it. You can also write some C code to call the API and then get the C compiler to output assembly and then you can copy/paste the relevant portion into your hand-written assembly
> The classic example being Chris Sawyer writing nearly all of Rollercoaster Tycoon in x86 assembly but requiring just enough C for the system calls.
I don’t think he had to do it that way. Maybe he just decided that was the path of least resistance for him.
Ultimately, he has to have assembly call C at some point, since his own assembly code has to call his C function which calls the Win32 API (technically not system calls, as other commenters have pointed out). Maybe, given the Win32 API involves some rather complex data structures, numerous constant definitions, various helper macros, etc, he may have just found it easier to use some of that stuff from C instead of trying to replicate the complexity in hand-written assembly, or having to translate all the C header definitions he needed into corresponding assembler macros
> > The classic example being Chris Sawyer writing nearly all of Rollercoaster Tycoon in x86 assembly but requiring just enough C for the system calls.
> I don’t think he had to do it that way. Maybe he just decided that was the path of least resistance for him.
From the Chris Sawyer himself[0]:
What language was RollerCoaster Tycoon programmed in?
It's 99% written in x86 assembler/machine code (yes, really!), with a small amount of C code used to interface to MS Windows and DirectX.
You don't need to use C if you want to call Win32 functions, although the calling convention is based on C. The stable API is not the user-kernel boundary but instead a set of DLLs (kernel32, user32, gdi32, etc.), which makes sense if you look at the history: Windows started out as a GUI on top of DOS. The actual system call mechanism is very different between the DOS-based lineage and the NT-based ones, but the Win32 API remains nearly the same.
Windows system call numbers aren't stable. It is done through their system libraries. So rather than invoking a syscall number as you do on Linux, if you want your code to be portable you import the DLL with that functionality and call the function. You can do this all from ASM without needing C, though the system libraries were likely written in a higher level language like C. You can also invoke system calls by number directly, it is just liable to not work on another version of Windows.
The worst thing that Windows has that's tied to your locale is case sensitivity for filenames. You can have two filenames be created under one locale, change your locale, then the filenames aren't considered different anymore.
If a name contains values beyond ASCII — technically out of spec
I'm not sure what spec it's referring to, but this is normal and expected for files in non-English systems.
Such tools often incorrectly assume UTF-8, which is what motivated this article.
Those tools are likely to be from the *nix world, where UTF-8 is far more common for the multibyte encoding --- but even there, you can have different codepages; and I have worked on Linux systems using CP1252 and 932 before.