Encoding a string as UTF-8 with BOM in PHP
Asked Answered
E

1

27

how can I force PHP to add the BOM when using utf8_encode ?

Here's what I am trying to do:

$zip->addFromString($filename, utf8_encode($xml));

Unfortunately (for me), the result will not have the BOM mark at the beginning.

Eichman answered 8/4, 2011 at 23:54 Comment(0)
B
80

Have you tried adding one yourself?

The UTF-8 BOM seems to be 0xEF 0xBB 0xBF, so you can attach it to your string after conversion to UTF-8.

$utf8_with_bom = chr(239) . chr(187) . chr(191) . $utf8_string;

Watch out, though. utf8_encode wants an ISO-8859-1 string. If you're working with XML, make sure that the XML isn't already UTF-8 encoded. The comments on the documentation suggest that the function is broken in a variety of fun ways, so you shouldn't throw it around unless you know that you need it.

Remember, PHP strings are simply dumb, unknowing bytes. They don't have a character set attached to them, so if the data in the string is already UTF-8, you don't need to run the conversion.

Also, the linked Wikipedia article says this:

While Unicode standard allows BOM in UTF-8, it does not require or recommend it. Byte order has no meaning in UTF-8 so a BOM only serves to identify a text stream or file as UTF-8 or that it was converted from another format that has a BOM.

You probably don't need to bother with the BOM tapdance to begin with.

Boatload answered 9/4, 2011 at 0:26 Comment(5)
I had a problem where Excel wouldn't open my UTF-8 CSV correctly without the BOM so it may not be required but it certainly can make a difference.Phonetic
You can make the number seem less "magical" by doing chr(0xEF).chr(0xBB).chr(0xBF) - this way you can see that it's hex, and from there understand better that it's the BOM.Joselynjoseph
If you use some old editor, e.g. EditPlus, then 'find in file' function can only search and recognize file with foreign characters encoded in utf8+bom.Nikolaos
Keep in mind that for the .CSV file to work in Excel for Mac, UTF8 BOM and encoding won't work - you need to convert your data to UTF16-LE and add a UTF16-LE BOM - https://mcmap.net/q/117574/-how-can-i-output-a-utf-8-csv-in-php-that-excel-will-read-properlyCognoscenti
I pledge to you my firstborn child. Thank you.Scrumptious

© 2022 - 2024 — McMap. All rights reserved.