Add byte order mark to a string via StringBuilder
Asked Answered
B

4

5

How can I add a byte order mark to a StringBuilder? (I have to pass a string to another method which will save it as a file, but I can't modify that method).

I tried this:

var sb = new StringBuilder();
sb.Append('\xEF');
sb.Append('\xBB');
sb.Append('\xBF');

But when I view it with hex editor, it adds the following sequence: C3 AF C2 BB C2 BF

The string is huge, so it would be good to do it without back and forth converting to byte array.

Edit: Clarification after questions in comments. I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage. I can't modify the other method.

Brakeman answered 10/3, 2014 at 17:7 Comment(3)
Why? The byte order mark isn't needed until you write to a file... The issue you see is because the byte order marks are not Unicode.Prepotency
I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage.Brakeman
For anyone that's using File.WriteAllText() - it supports setting the encoding to UTF8 which will add a BOM. See: learn.microsoft.com/en-us/dotnet/api/…Zebulun
D
15

Two options:

  1. Don't include the byte order mark in your text at all... instead use an encoding which will automatically include it
  2. Include it as a character in your StringBuilder:

    sb.Append('\uFEFF'); // U+FEFF is the byte-order mark character
    

Personally I'd go for the first approach normally, but the "I can't modify that method" suggests it may not be an option in your case.

Decigram answered 10/3, 2014 at 17:13 Comment(1)
Thank you. Yes, you are right, I'd go for the first one normally, but I'm taking this approach beacuse I have to pass the string to another method which takes a string and creates a file of it on Azure Blob Storage.Brakeman
P
8

Byte-order marks are to inform readers of a file that the file is of a particular encoding. As such, you should only need the byte-order marks (BOM) in the actual file. If you want to include BOM in a text file you're writing, simply use StreamWriter to write to the file. For example:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write(sb.ToString);
}

If you don't want BOM with UTF-8:

using(var writer = new StreamWriter(stream))
{
    writer.Write(sb.ToString());
}

Or if you want different BOM:

using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF16))
{
    writer.Write(sb.ToString);
}

Update:

If you wanted to be coupled from the implementation detail of a BOM or a BOM of a particular encoding (i.e. could change at runtime or after deployment) but still wanted to pass a BOM-marked string, you could do something like this (assumes .NET 4.5):

var stream = new MemoryStream();
var encoding = Encoding.UTF8; // TODO: configurize this, if necessary
using(var writer = new StreamWriter(stream, encoding, 1024, true))
{
    writer.Write(sb.ToString());
}
CantModifyButMustUseThis(encoding.GetString(stream.ToArray());
Prepotency answered 10/3, 2014 at 17:20 Comment(7)
I am aware what BOM is for. However, as I mentioned in my question, I must pass it to another method (which takes a string and creates a file of it on Azure Blob Storage), that's why I am taking this approach.Brakeman
This is misleading. For example, with UTF-8 and StreamWriter, if you leave out the encoding constructor argument entirely or if you use new UTF8Encoding() as the argument, then UTF-8 without the byte-order-mark is produced. On the other hand, if you specify the argument as Encoding.UTF8 or as new UTF8Encoding(true) you get UTF-8 with BOM. This is a bit of a gotcha, actually. So your first example is wrong.Jahvist
@JeppeStigNielsen Yes, you are correct. I've modified by answer.Prepotency
@user2270404 The stream used by StreamWriter does not need to be a file stream.Prepotency
@Peter Ritchie: I understand, but how can I pass the output of your solution to CantModifyButMustUseThis(string content) ?Brakeman
@user2270404 I've added an update to the answer; but, I would push hard to use a stream in methods like this. e.g. what if the string was too large to fit in memory?Prepotency
There is no Encoding.UTF16 in dotnet core, use Encoding.Unicode instead.Ganesa
C
1

IIRC (and not certain that I do), BOM gets added when you convert to byte using one of the relevant Unicode Encoders. I believe some of UnicodeEncoding's constructors take a bool that control if to add BOM. For example, calling the constructor public UnicodeEncoding (bool bigEndian, bool byteOrderMark); and setting the argument byteOrderMark to true should cause BOM to be emitted during serialization of your string.

Comma answered 10/3, 2014 at 17:12 Comment(2)
While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewHealing
@Ouroborus, I made the edit -- that should address it, I hope?Comma
C
0

I used this code in ASP.NET core, and well!! it works

 [HttpGet("GetCsv")]
    public async Task<IActionResult> GetCsv() {
        
        var cc = new CsvConfiguration(new System.Globalization.CultureInfo("en-US"));
        var entity = await _service.AdminPanelList();
        using (var ms = new MemoryStream()) {
            using (var sw = new StreamWriter(stream: ms, encoding: new UTF8Encoding(true))) {
                using (var cw = new CsvWriter(sw, cc)) {

                    var bom = '\uFEFF'.ToString();
                    byte[] bomArray = Encoding.UTF8.GetBytes(bom);
                    
                    ms.Write(bomArray);
                    cw.WriteRecords(entity);
                }

                var finalArray = ms.ToArray();
                



                var result = File(finalArray, "text/csv", $"PersonExport.csv");
                    

                return result;
            }
        }
    }
Chas answered 11/11, 2020 at 11:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.