There's a difference between what a program should accept as input and what it s...

Const-me · on June 20, 2016

> There's a difference between what a program should accept as input and what it should generate as output.

I’m a Windows developer. In my world, a program should generate its output in whatever format user wants it to be.

When I press “File/Save as” in visual studio and click on the down arrow icon, I see a choice of more than 100 different encodings (including all flavors of Unicode with and without the BOM), and independent choice of 3 line endings (Window, mac, Unix).

> BOM for UTF-8 text files seems to be a Microsoft thing

Practically — maybe, most Microsoft apps tend to understand those BOMs, and most *nix tools don’t, even on input.

Officially — definitely no, we both saw the spec on unicode.org.

Avernar · on June 20, 2016

> I’m a Windows developer. In my world, a program should generate its output in whatever format user wants it to be.

When generating output for a user, letting them choose is a good idea. But for interop with other programs I leave it off unless the program needs it.

> Officially — definitely no, we both saw the spec on unicode.org.

The spec says the BOM is optional. Some Microsoft programs however require it.

Const-me · on June 20, 2016

> for interop with other programs I leave it off unless the program needs it.

Plain text isn’t exactly a machine-friendly format.

If you want to interop with other programs, the good choice is e.g. XML. That has this encoding problem fixed as a part of the standard.

> The spec says the BOM is optional. Some Microsoft programs however require it.

Could you please name a Microsoft program that you think requires a BOM?

I’m asking because I have completely different experience. For me, Microsoft programs open text files just fine, with or without the BOM. But most *nix and osx programs show me garbage instead of BOM.

Avernar · on June 20, 2016

> Plain text isn’t exactly a machine-friendly format.

Works fine for unix. :D

> Could you please name a Microsoft program that you think requires a BOM?

Visual C++ off the top of my head. It mangles UTF-8 string literals without the BOM in the source code.

> For me, Microsoft programs open text files just fine, with or without the BOM. But most *nix and osx programs show me garbage instead of BOM.

That's what I was trying to say about the BOM being prevalent on the Windows side of the fence. Some programs require it, some always generate it so most program now accept it.

On the unix/osx side everyone switched to UTF-8 so the BOM is redundant. Everything is UTF-8 so the silliness of this needs a BOM that doesn't need a BOM doesn't exist. Good example of what the "UTF-8 Everywhere" site is trying to promote.

Personally I really wish Microsoft would eventually fix their UTF-8 codepage. Would be so nice not having to convert to/from UTF-16 at the Win32 API boundary.

Const-me · on June 20, 2016

> Works fine for unix. :D

The trend towards higher-level data formats is universal across all OSes.

Even on Unix, users typically read html, write odf or docx both being xml, print PostScript, etc.

Plain text is friendly towards developers. But it’s neither interop-friendly nor user friendly.

> Visual C++ off the top of my head

Only the C++ compiler. MS can’t change the compiler because backward compatibility. The IDE however works fine with such files.

> "UTF-8 Everywhere" site is trying to promote.

The transition is going to be expensive, because most languages and frameworks (C++/MFC/ATL/QT, .NET languages, JVM languages, Python, etc) use Unicode (USC2 or UT16) strings for decades already.

To justify the costs, the benefits of the transition must be substantial.

And there aren’t any.

Avernar · on June 20, 2016

> Plain text is friendly towards developers. But it’s neither interop-friendly nor user friendly.

Kind of got off track here. You can process a lot of formats as text (html, css, xml, etc). So a BOM there is unnecessary and sometimes detrimental. On the unix side there are a lot of text utilities that do useful things that you can do on these formats. That's probably why BOMs are non existent there.

> MS can’t change the compiler because backward compatibility.

You care to tell MS that? Every single time I've done a major VS upgrade my code had to be changed because something that was valid before stopped being valid.

> And there aren’t any.

If you can't see any benefit of using UTF-8 then I'm done debating with you.