Add byte order mark to a string via StringBuilder

Asked 10/3, 2014 at 17:7 Answered 11/11, 2020 at 11:29

Solved c#utf-8 stringbuilder byte-order-mark

How can I add a byte order mark to a StringBuilder? (I have to pass a string to another method which will save it as a file, but I can't modify that method).

I tried this:

var sb = new StringBuilder();
sb.Append('\xEF');
sb.Append('\xBB');
sb.Append('\xBF');

But when I view it with hex editor, it adds the following sequence: C3 AF C2 BB C2 BF

The string is huge, so it would be good to do it without back and forth converting to byte array.

Edit: Clarification after questions in comments. I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. I can't modify the other method.

Brakeman answered 10/3, 2014 at 17:7 Comment(3)

Why? The byte order mark isn't needed until you write to a file... The issue you see is because the byte order marks are not Unicode. – Prepotency 10/3, 2014 at 17:11

I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. – Brakeman 10/3, 2014 at 17:38

For anyone that's using File.WriteAllText() - it supports setting the encoding to UTF8 which will add a BOM. See: learn.microsoft.com/en-us/dotnet/api/… – Zebulun 3/1 at 0:30

Two options:

Don't include the byte order mark in your text at all... instead use an encoding which will automatically include it

Include it as a character in your StringBuilder:

sb.Append('\uFEFF'); // U+FEFF is the byte-order mark character

Personally I'd go for the first approach normally, but the "I can't modify that method" suggests it may not be an option in your case.

Decigram answered 10/3, 2014 at 17:13 Comment(1)

Thank you. Yes, you are right, I'd go for the first one normally, but I'm taking this approach beacuse I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. – Brakeman 10/3, 2014 at 17:37

Byte-order marks are to inform readers of a file that the file is of a particular encoding. As such, you should only need the byte-order marks (BOM) in the actual file. If you want to include BOM in a text file you're writing, simply use StreamWriter to write to the file. For example:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write(sb.ToString);
}

If you don't want BOM with UTF-8:

using(var writer = new StreamWriter(stream))
{
    writer.Write(sb.ToString());
}

Or if you want different BOM:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF16))
{
    writer.Write(sb.ToString);
}

Update:

If you wanted to be coupled from the implementation detail of a BOM or a BOM of a particular encoding (i.e. could change at runtime or after deployment) but still wanted to pass a BOM-marked string, you could do something like this (assumes .NET 4.5):

var stream = new MemoryStream();
var encoding = Encoding.UTF8; // TODO: configurize this, if necessary
using(var writer = new StreamWriter(stream, encoding, 1024, true))
{
    writer.Write(sb.ToString());
}
CantModifyButMustUseThis(encoding.GetString(stream.ToArray());

Prepotency answered 10/3, 2014 at 17:20 Comment(7)

I am aware what BOM is for. However, as I mentioned in my question, I must pass it to another method (which takes a string and creates a file of it on Azure Blob Storage), that's why I am taking this approach. – Brakeman 10/3, 2014 at 17:35

This is misleading. For example, with UTF-8 and StreamWriter, if you leave out the encoding constructor argument entirely or if you use new UTF8Encoding() as the argument, then UTF-8 without the byte-order-mark is produced. On the other hand, if you specify the argument as Encoding.UTF8 or as new UTF8Encoding(true) you get UTF-8 with BOM. This is a bit of a gotcha, actually. So your first example is wrong. – Jahvist 10/3, 2014 at 17:42

@JeppeStigNielsen Yes, you are correct. I've modified by answer. – Prepotency 10/3, 2014 at 17:47

@user2270404 The stream used by StreamWriter does not need to be a file stream. – Prepotency 10/3, 2014 at 18:5

@Peter Ritchie: I understand, but how can I pass the output of your solution to CantModifyButMustUseThis(string content) ? – Brakeman 10/3, 2014 at 19:50

@user2270404 I've added an update to the answer; but, I would push hard to use a stream in methods like this. e.g. what if the string was too large to fit in memory? – Prepotency 10/3, 2014 at 20:6

There is no Encoding.UTF16 in dotnet core, use Encoding.Unicode instead. – Ganesa 14/8, 2020 at 9:14

IIRC (and not certain that I do), BOM gets added when you convert to byte using one of the relevant Unicode Encoders. I believe some of UnicodeEncoding's constructors take a bool that control if to add BOM. For example, calling the constructor public UnicodeEncoding (bool bigEndian, bool byteOrderMark); and setting the argument byteOrderMark to true should cause BOM to be emitted during serialization of your string.

Comma answered 10/3, 2014 at 17:12 Comment(2)

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review – Healing 1/11, 2023 at 3:12

@Ouroborus, I made the edit -- that should address it, I hope? – Comma 2/11, 2023 at 16:38

I used this code in ASP.NET core, and well!! it works

 [HttpGet("GetCsv")]
    public async Task<IActionResult> GetCsv() {
        
        var cc = new CsvConfiguration(new System.Globalization.CultureInfo("en-US"));
        var entity = await _service.AdminPanelList();
        using (var ms = new MemoryStream()) {
            using (var sw = new StreamWriter(stream: ms, encoding: new UTF8Encoding(true))) {
                using (var cw = new CsvWriter(sw, cc)) {

                    var bom = '\uFEFF'.ToString();
                    byte[] bomArray = Encoding.UTF8.GetBytes(bom);
                    
                    ms.Write(bomArray);
                    cw.WriteRecords(entity);
                }

                var finalArray = ms.ToArray();
                



                var result = File(finalArray, "text/csv", $"PersonExport.csv");
                    

                return result;
            }
        }
    }

Chas answered 11/11, 2020 at 11:29 Comment(0)

Update:

Recommended topics

Hot tags