Hacker News new | past | comments | ask | show | jobs | submit login

Isn't this one of those "100 things Programmers don't know about People's Names" things?

Like the poor, it will be with us always.




I don't know, it's just a Unicode character? Not even a newer one, it's just 2 utf8 bytes. Pretty much everything should support that in 2021.

When I think of 100 things I think of stuff like "some people spell their name in all lowercase and get really funny if you change it"


Yeah so double byte characters costs extra. I don’t know, a checkbox or something default off. Always did still does. Double width costs even more.


you're getting downvoted, but between tchar hiding wchar vs char... this literally could be someone toggling off the "UNICODE" checkbox in visual studio somewhere.


> Pretty much everything should support that in 2021.

Yes, like IPv6.


UTF-8 is much less to ask for than IPv6.


windows probably defaults to latin-1


the default windows encoding is UTF-16, a long time ago it was Windows-1252 https://en.wikipedia.org/wiki/Windows-1252


Given the frequency with which Windows-12* mojibake occurs, people are either a number of holdouts still using Windows 98 SE, or there are a good number of paths in Windows that still use the non-Unicode encodings.


Windows supports Windows 98 API and it's more natural to use for some languages like C++. No change is planned there. Windows 98 API is also closer to Unix API, which can incentivize the programmer to use the same approach on windows and unix.


All windows needed to do is support setting that API to UTF-8. It's not like it doesn't already support multi-byte encodings. It's not like they even needed to even assign an ID for UTF-8 or implement the conversions - those existed already. All they needed to do is allow programs to set their codepage to UTF-8. This finally became possible two years ago. Better late than never I guess.


or CP-1251, in some locations.


There are a good number of them, all depending on locale.

In this case, I'd guess CP-1250, since 0xb3, from the error, decodes to "ł", from the name, in that encoding. (But not in CP-1251, or '52.)

if you want to see how to arrive there: https://news.ycombinator.com/item?id=28939960




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: