Isn't this one of those "100 things Programmers don't know about People's Names"...

xdfgh1112 · on Oct 20, 2021

I don't know, it's just a Unicode character? Not even a newer one, it's just 2 utf8 bytes. Pretty much everything should support that in 2021.

When I think of 100 things I think of stuff like "some people spell their name in all lowercase and get really funny if you change it"

numpad0 · on Oct 20, 2021

Yeah so double byte characters costs extra. I don’t know, a checkbox or something default off. Always did still does. Double width costs even more.

horsawlarway · on Oct 20, 2021

you're getting downvoted, but between tchar hiding wchar vs char... this literally could be someone toggling off the "UNICODE" checkbox in visual studio somewhere.

deathanatos · on Oct 21, 2021

> Pretty much everything should support that in 2021.

Yes, like IPv6.

selfhoster11 · on Oct 21, 2021

UTF-8 is much less to ask for than IPv6.

hprotagonist · on Oct 20, 2021

windows probably defaults to latin-1

bryanrasmussen · on Oct 20, 2021

the default windows encoding is UTF-16, a long time ago it was Windows-1252 https://en.wikipedia.org/wiki/Windows-1252

deathanatos · on Oct 21, 2021

Given the frequency with which Windows-12* mojibake occurs, people are either a number of holdouts still using Windows 98 SE, or there are a good number of paths in Windows that still use the non-Unicode encodings.

GoblinSlayer · on Oct 21, 2021

Windows supports Windows 98 API and it's more natural to use for some languages like C++. No change is planned there. Windows 98 API is also closer to Unix API, which can incentivize the programmer to use the same approach on windows and unix.

account42 · on Oct 21, 2021

All windows needed to do is support setting that API to UTF-8. It's not like it doesn't already support multi-byte encodings. It's not like they even needed to even assign an ID for UTF-8 or implement the conversions - those existed already. All they needed to do is allow programs to set their codepage to UTF-8. This finally became possible two years ago. Better late than never I guess.

hprotagonist · on Oct 20, 2021

or CP-1251, in some locations.

deathanatos · on Oct 21, 2021

There are a good number of them, all depending on locale.

In this case, I'd guess CP-1250, since 0xb3, from the error, decodes to "ł", from the name, in that encoding. (But not in CP-1251, or '52.)

if you want to see how to arrive there: https://news.ycombinator.com/item?id=28939960