How to output Byte Order Mark when writing to TextWriter?
Asked Answered
G

2

8

i am writing text to a TextWriter. i want the UTF-16 Byte Order Mark (BOM) to appear in the output:

public void ProcessRequest(HttpContext context)
{
   context.Response.ContentEncoding = new UnicodeEncoding(true, true);
   WriteStuffToTextWriter(context.Response.Output);
}

Except the output doesn't contain a byte order mark:

HTTP/1.1 200 OK
Server: ASP.NET Development Server/10.0.0.0
Date: Thu, 06 Sep 2012 21:09:23 GMT
X-AspNet-Version: 4.0.30319
Content-Disposition: attachment; filename="Transactions_Calendar_20120906.csv"
Cache-Control: private
Content-Type: text/csv; filename="Transactions_Calendar_20120906.csv"; charset=utf-16BE
Content-Length: 95022
Connection: Close

JobName,ShiftName,6////09////2012 12::::00::::00 АΜ,...

How do i tell a TextWriter to write the encoding marker?

Note: The 2nd paramter in UnicodeEncoding:

   context.Response.ContentEncoding = new UnicodeEncoding(true, true);

byteOrderMark
Type: System.Boolean
true to specify that a Unicode byte order mark is provided; otherwise, false.

Gamba answered 6/9, 2012 at 21:14 Comment(4)
what exactly is WriteStuffToTextWriter you probably have to specify the encoding there in your StreamWriterWashedout
What makes you say that it doesn't contain a BOM with the code you have?Stereobate
I'm with @JonHanna. Also, have you tried creating a console app and writing the same stuff directly to a file and see what that looks like? After all, a lof stuff happens between your web server and your browser.Lilliamlillian
A console app should hide the BOM too, the whole point of the BOM is that it doesn't appear as part of the text, but gives data on who to decode it from octets into text. A hex view of the stream above though would show an FE and FF or an FF and FE (the order of those bytes being precisely the what the Byte Order Mark is meant to reveal, as U+FFFE isn't a valid character so only one order can be correct). Fiddler has a hex view.Stereobate
G
13

Short Version

String zwnbsp = "\xfeff"; //Zero-width non-breaking space

//The Zero-width non-breaking space character ***is*** the Byte-Order-Mark (BOM).
String s = zwnbsp+"The quick brown fox jumped over the lazy dog.";
writer.Write(s);

Long Version

At some point i realized how simple the solution is.

i used to think that the Unicode Byte-Order-Mark was some special signature. i used to think i had to carefully decide which byte sequence i wanted to output, in order to output the correct BOM:

  • 0xFE 0xFF
  • 0xFF 0xFE
  • 0xEF 0xBB 0xBF

But since then i realized that byte Byte-Order-Mark is not some special byte sequence that you have to prepend to your file.

The BOM is just a Unicode character. You don't output any bytes; you only output character U+FEFF. The very act of writing that character, the serializer will convert it to whatever encoding you're using for you.

The character U+feff (ZERO WIDTH NO-BREAK SPACE) was chosen for good reason. It's a space, so it has no meaning, and it is zero width, so you shouldn't even see it.

That means that my question is fundamentally flawed. There is no such thing as "writing a byte-order-mark". You just make sure the first character you write out is U+FEFF. In my case i am writing to a TextWriter:

void WriteStuffToTextWriter(TextWriter writer)
{
   String csvExport = GetExportAsCSV();

   writer.Write("\xfeff"); //Output unicode charcter U+FEFF as a byte order marker
   writer.Write(csvExport);
}

The TextWriter will handle converting the unicode character U+feff into whatever byte encoding it has been configured to use.

Note: Any code is released into the public domain. No attribution required.

Gamba answered 27/7, 2013 at 22:57 Comment(0)
T
0

Write out context.Response.ContentEncoding.GetPreamble(). Take a look at Write text files without Byte Order Mark (BOM)?

Thermometry answered 6/9, 2012 at 21:19 Comment(1)
Careful though. I'm not sure that they aren't actually outputting a BOM already. A second U+FEFF would be interpreted as a zero-width no-break space at the start of the actual text, after the BOM.Stereobate

© 2022 - 2024 — McMap. All rights reserved.