It seems you are mixing two things: inner string representation and read/write e...

Dwedit · 2024-04-27T02:36:39 1714185399

Or possibly confusing it with JavaScript, which treats strings as sequences of UTF-16 characters?

cryptonector · 2024-04-26T15:39:48 1714145988

Not even on Windows?

layer8 · 2024-04-26T16:16:03 1714148163

No, file I/O on Windows in general doesn’t use UTF-16, but the regional code page, or nowadays UTF-8 if the application decides so.

int_19h · 2024-04-26T21:34:48 1714167288

Depends on what you define as "file I/O", though. NTFS filenames are UTF-16 (or rather UCS2). As far as file contents, there isn't really a standard, but FWIW for a long time most Windows apps - Notepad being the canonical example when asked to save anything as "Unicode" would save it as UTF-16.

layer8 · 2024-04-26T22:21:03 1714170063

I'm talking about the default behavior of Microsoft's C runtime (MSVCRT.DLL) that everyone is/was using.

UTF-16 text files are rather rare, as is using Notepad's UTF-16 options. The only semi-common use I know of is *.reg files saved from regedit. One issue with UTF-16 is that it has two different serializations (BE and LE), and hence generally requires a BOM to disambiguate.

TheCycoONE · 2024-04-27T02:43:43 1714185823

Powershell use to output utf-16 by default on Windows. It might still but it's been awhile since I needed to try.

int_19h · 2024-04-27T10:27:30 1714213650

Then you're talking about the C stdlib, which, yeah, is meant to use the locale-specific encoding on any platform, so it's not really a Windows thing specifically. But even then someone could use the CRT but call wfopen() rather than fopen() etc - this was actually not uncommon for Windows software precisely because it let you handle Unicode without having to work with Win32 API directly.

Microsoft's implementation of fopen() also supports "ccs=..." to open Unicode text files in Unicode, and interestingly "ccs=UNICODE" will get you UTF-16LE, not UTF-8 (but you can do "ccs=UTF-8"). .NET also has this weird naming quirk where Encoding.Unicode is UTF-16, although there at least UTF-8 is the default for all text I/O classes like StreamReader if you don't specify the encoding. Still, many people didn't know better, and so some early .NET software would use UTF-16 for text I/O for no reason other than its developers believing that Encoding.Unicode is obviously what they are supposed to be using to "support Unicode", and so explicitly passing it everywhere.