Hi Stefan, Am Montag, 9. April 2012, 11:14:43 schrieb bugzilla-daemon at bugzilla.wpkg.org: > http://bugzilla.wpkg.org/show_bug.cgi?id=269 > In general Notepad++ is happy with "ANSI as UTF-8" encoded files, but this > seems not to be the case for this file. > > The BOM doesn't hurt and makes sure that any editor, not just only > specialized ones, display and edit the text correctly. please don't get me wrong, all of what you say in this bug is correct :) However, there is IMHO a subtle problem with BOMs in any file that is supposed to be machine-readable. Without a BOM, any UTF-8 encoded file will be correctly parsed by anything that digests 8-bit ASCII files. A BOM can break this in sometimes hard to debug ways, as it's usually not visible. Imagine a simple key=value based config file: message="Içh ßiñ €in Täst ☺" Without a BOM, this will be correctly handled by even the most stupid parser. As long as the code consuming the data from that parser is aware of the UTF-8 encoding, all is fine. But when you add a BOM, the parser will fail to match "message" vs. "<BOM>message" as key and fail miserably. Any XML tool must definately cope with the presence of a BOM. But then an XML file without explicitly specified encoding and without BOM must be UTF-8 encoded anyway. So as you already said, the BOM helps non-specialized editors. Right, but personally I've had those invisible buggers bite me several times while they never served me any good ;) {Gosh I do feel like ranting, it's not towards you, but shall emphasize why I consider adding a BOM a valid but unfortunate "fix":} <rant> I'd assert that BOMs are a kludge that should be used very sparingly. In fact, as the byte order is clear in UTF-8, the BOM as customary and necessary with UTF-16/UCS-2 is degenerated to only flag the text as Unicode. What an epic fail ;) Anything non-UTF-8 should be flagged as "Danger: obsolete encoding inside" intead and those ISO-8859-*, WIN125* and whatnot should go die in a fire. AFAIK among all current OSs, only Windows still doesn't default to UTF-8 in text files. Plus MS has this evil habit of assuming Unicode = UCS-2, which totally breaks ASCII-compatibility, breaks all protocols that can't deal with <NUL>-bytes in text streams etc. (I've seen Outlook Express send E-Mails with UCS-2 content as text/plain, no charset given, no transfer encoding and all those lovely <NUL>s inside...) Yeah, UCS-2 is a Unicode encoding and fine for internal processing (if all you need is the BMP). But when serializing to a file, UTF-8 is the way to go, no BOMs needed. </rant> Kind regards, Malte |