PHP: How to put out text/plain content in UTF-8 with BOM for download?
Asked Answered
T

2

7

I need to offer a plain text file for download. The text file needs to be UTF-8 encoded and needs a BOM to be present. I saved my php file as UTF-8 without BOM and send the following headers:

header('HTTP/1.1 200 OK');
header('Content-Type: text/plain; charset=utf-8');
header('Content-Disposition: attachment; filename="test.txt"');

I save the script without BOM because it would interfere with sending the headers. So I tried putting a BOM manually by:

echo chr(239).chr(187).chr(191);

Then I put out my text. Without The manual BOM an editor like Notepad++ will recognize the file to be ANSI encoded, with the supposed manual BOM it will be recognized as UTF-8 but will contain the characters:



at the start. So I assume it is detected to be UTF-8 by means of heuristics and my manual BOM is wrong.

How do I do it right?

EDIT: HEX contents as requested. I simply made the text "SOME TEXT" and I get:

C3 AF C2 BB C2 BF 53 4F 4D 45 20 54 45 58 54

Saving "SOME TEXT" as UTF-8 with BOM yields:

EF BB BF 53 4F 4D 45 20 54 45 58 54
Tripitaka answered 24/10, 2012 at 20:5 Comment(6)
why do you want a BOM, if a file is not recognized as UTF-8 (without a BOM) it's because the contents are not UTF-8.Trumaine
Sounds well researched and tested. But please show a hexdump of the generated text file nonetheless. Save the output to a file hoster as well please. Have you viewed the file in different editors? Do you get the effect for plain browser output (sans Content-Disposition header) too?Cutlip
@Cutlip I added the hexdump you requested. And I get the same result "SOME TEXT" without the Content-Disposition header - as output in Firefox for example.Tripitaka
Yes, that's an UTF-8 encoded BOM. So something's saving EF as C3 AF and BB as C2 BB and BF as C2 BF. My bet's on the editor still.Cutlip
@Cutlip How so? The EF BB BF comes from echo chr(239).chr(187).chr(191); So how could it be about the editor? If someone is to blame it's got to be the browser. I'll check that...Tripitaka
It's not the browser either - same output in IE 9, FF 15 and Safari 5Tripitaka
M
1

What you're seeing is the result of interpreting the individual bytes of the BOM as IOS-8859-1 and then encoding the result in UTF-8. As for why this happens, I suspect the chr() function - try using char literals instead, i.e.

echo "\xEF\xBB\xBF";
Mystery answered 24/10, 2012 at 20:57 Comment(1)
I did try that when Fivell suggested it in his answer. Same result. I also tried this both on a live LAMP and the local testing WAMP environment.Tripitaka
A
0

Check your mbstring extension's settings (it can be set up to auto encode output)

; This directive specifies the regex pattern of content types for which mb_output_handler()
; is activated.
; Default: mbstring.http_output_conv_mimetype=^(text/|application/xhtml\+xml)
; mbstring.http_output_conv_mimetype=

Both "\xEF\xBB\xBF" & chr(239).chr(187).chr(191) can be used to generate BOM, you can try these with file_put_contents() on your own.

Aetiology answered 24/10, 2012 at 21:25 Comment(2)
I disabled the following, restarted Apache and its still the same. ;extension=php_mbstring.dll ;extension=php_exif.dllTripitaka
But yes writing the BOM either way to a file on the server works.Tripitaka

© 2022 - 2024 — McMap. All rights reserved.