> for interop with other programs I leave it off unless the program needs it.
Plain text isn’t exactly a machine-friendly format.
If you want to interop with other programs, the good choice is e.g. XML. That has this encoding problem fixed as a part of the standard.
> The spec says the BOM is optional. Some Microsoft programs however require it.
Could you please name a Microsoft program that you think requires a BOM?
I’m asking because I have completely different experience. For me, Microsoft programs open text files just fine, with or without the BOM. But most *nix and osx programs show me garbage instead of BOM.
> Plain text isn’t exactly a machine-friendly format.
Works fine for unix. :D
> Could you please name a Microsoft program that you think requires a BOM?
Visual C++ off the top of my head. It mangles UTF-8 string literals without the BOM in the source code.
> For me, Microsoft programs open text files just fine, with or without the BOM. But most *nix and osx programs show me garbage instead of BOM.
That's what I was trying to say about the BOM being prevalent on the Windows side of the fence. Some programs require it, some always generate it so most program now accept it.
On the unix/osx side everyone switched to UTF-8 so the BOM is redundant. Everything is UTF-8 so the silliness of this needs a BOM that doesn't need a BOM doesn't exist. Good example of what the "UTF-8 Everywhere" site is trying to promote.
Personally I really wish Microsoft would eventually fix their UTF-8 codepage. Would be so nice not having to convert to/from UTF-16 at the Win32 API boundary.
The trend towards higher-level data formats is universal across all OSes.
Even on Unix, users typically read html, write odf or docx both being xml, print PostScript, etc.
Plain text is friendly towards developers. But it’s neither interop-friendly nor user friendly.
> Visual C++ off the top of my head
Only the C++ compiler. MS can’t change the compiler because backward compatibility. The IDE however works fine with such files.
> "UTF-8 Everywhere" site is trying to promote.
The transition is going to be expensive, because most languages and frameworks (C++/MFC/ATL/QT, .NET languages, JVM languages, Python, etc) use Unicode (USC2 or UT16) strings for decades already.
To justify the costs, the benefits of the transition must be substantial.
> Plain text is friendly towards developers. But it’s neither interop-friendly nor user friendly.
Kind of got off track here. You can process a lot of formats as text (html, css, xml, etc). So a BOM there is unnecessary and sometimes detrimental. On the unix side there are a lot of text utilities that do useful things that you can do on these formats. That's probably why BOMs are non existent there.
> MS can’t change the compiler because backward compatibility.
You care to tell MS that? Every single time I've done a major VS upgrade my code had to be changed because something that was valid before stopped being valid.
> And there aren’t any.
If you can't see any benefit of using UTF-8 then I'm done debating with you.
When generating output for a user, letting them choose is a good idea. But for interop with other programs I leave it off unless the program needs it.
> Officially — definitely no, we both saw the spec on unicode.org.
The spec says the BOM is optional. Some Microsoft programs however require it.