Write text files without Byte Order Mark (BOM)?
Asked Answered
D

10

121

I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this?
I can write file with UTF8 encoding but, how to remove Byte Order Mark from it?

edit1: I have tried code like this;

    Dim utf8 As New UTF8Encoding()
    Dim utf8EmitBOM As New UTF8Encoding(True)
    Dim strW As New StreamWriter("c:\temp\bom\1.html", True, utf8EmitBOM)
    strW.Write(utf8EmitBOM.GetPreamble())
    strW.WriteLine("hi there")
    strW.Close()

        Dim strw2 As New StreamWriter("c:\temp\bom\2.html", True, utf8)
        strw2.Write(utf8.GetPreamble())
        strw2.WriteLine("hi there")
        strw2.Close()

1.html get created with UTF8 encoding only and 2.html get created with ANSI encoding format.

Simplified approach - http://whatilearnttuday.blogspot.com/2011/10/write-text-files-without-byte-order.html

Dionysian answered 13/3, 2010 at 7:43 Comment(1)
If you don't want a BOM, why are you writing GetPreamble()?Absorption
E
211

In order to omit the byte order mark (BOM), your stream must use an instance of UTF8Encoding other than System.Text.Encoding.UTF8 (which is configured to generate a BOM). There are two easy ways to do this:

1. Explicitly specifying a suitable encoding:

  1. Call the UTF8Encoding constructor with False for the encoderShouldEmitUTF8Identifier parameter.

  2. Pass the UTF8Encoding instance to the stream constructor.

' VB.NET:
Dim utf8WithoutBom As New System.Text.UTF8Encoding(False)
Using sink As New StreamWriter("Foobar.txt", False, utf8WithoutBom)
    sink.WriteLine("...")
End Using
// C#:
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
using (var sink = new StreamWriter("Foobar.txt", false, utf8WithoutBom))
{
    sink.WriteLine("...");
}

2. Using the default encoding:

If you do not supply an Encoding to StreamWriter's constructor at all, StreamWriter will by default use an UTF8 encoding without BOM, so the following should work just as well:

' VB.NET:
Using sink As New StreamWriter("Foobar.txt")
    sink.WriteLine("...")
End Using
// C#:
using (var sink = new StreamWriter("Foobar.txt"))
{
    sink.WriteLine("...");
}

Finally, note that omitting the BOM is only permissible for UTF-8, not for UTF-16.

Expressway answered 13/3, 2010 at 8:49 Comment(3)
Not always wise: for example My.Computer.FileSystem.WriteAllText writes the BOM if no encoding is specified.Intolerance
My.Computer.FileSystem.WriteAllText is an exception in this regard, guessing for backwards VB compatibility perhaps? File.WriteAllText defaults to UFT8NoBOM.Lumbye
This is especially helpful if you want to write a *.m3u8 playlist file for VLC. VLC is still not capable to read UTF8 playlist files WITH BOM! This seems to be fixed according to trac.videolan.org/vlc/ticket/21860, but will only be included in VLC v4.Monkey
A
29

Try this:

Encoding outputEnc = new UTF8Encoding(false); // create encoding with no BOM
TextWriter file = new StreamWriter(filePath, false, outputEnc); // open file with encoding
// write data here
file.Close(); // save and close it
Alysiaalyson answered 7/5, 2010 at 12:18 Comment(0)
L
6

Just Simply use the method WriteAllText from System.IO.File.

Please check the sample from File.WriteAllText.

This method uses UTF-8 encoding without a Byte-Order Mark (BOM), so using the GetPreamble method will return an empty byte array. If it is necessary to include a UTF-8 identifier, such as a byte order mark, at the beginning of a file, use the WriteAllText(String, String, Encoding) method overload with UTF8 encoding.

Lempira answered 14/10, 2014 at 6:26 Comment(1)
The one from the My namespace does use BOMIntolerance
C
5

If you do not specify an Encoding when creating a new StreamWriter the default Encoding object used is UTF-8 No BOM which is created via new UTF8Encoding(false, true).

So to create a text file without the BOM use of of the constructors that do not require you to provide an encoding:

new StreamWriter(Stream)
new StreamWriter(String)
new StreamWriter(String, Boolean)
Continue answered 23/6, 2015 at 20:46 Comment(2)
What if I need to specify leaveOpen?Pomfret
@Pomfret in that case you can not use the default encoding that StreamWriter uses. You'll need to specify new UTF8Encoding(false, true) for your encoding to be able to specify leaveOpen and not have the BOM.Continue
G
4

Interesting note with respect to this: strangely, the static "CreateText()" method of the System.IO.File class creates UTF-8 files without BOM.

In general this the source of bugs, but in your case it could have been the simplest workaround :)

Goatskin answered 14/4, 2011 at 8:21 Comment(0)
E
3

I think Roman Nikitin is right. The meaning of the constructor argument is flipped. False means no BOM and true means with BOM.

You get an ANSI encoding because a file without a BOM that does not contain non-ansi characters is exactly the same as an ANSI file. Try some special characters in you "hi there" string and you'll see the ANSI encoding change to without-BOM.

Ebersole answered 27/11, 2013 at 11:15 Comment(0)
F
1

XML Encoding UTF-8 without BOM
We need to submit XML data to the EPA and their application that takes our input requires UTF-8 without BOM. Oh yes, plain UTF-8 should be acceptable for everyone, but not for the EPA. The answer to doing this is in the above comments. Thank you Roman Nikitin.

Here is a C# snippet of the code for the XML encoding:

    Encoding utf8noBOM = new UTF8Encoding(false);  
    XmlWriterSettings settings = new XmlWriterSettings();  
    settings.Encoding = utf8noBOM;  
        …  
    using (XmlWriter xw = XmlWriter.Create(filePath, settings))  
    {  
        xDoc.WriteTo(xw);  
        xw.Flush();  
    }    

To see if this actually removes the three leading character from the output file can be misleading. For example, if you use Notepad++ (www.notepad-plus-plus.org), it will report “Encode in ANSI”. I guess most text editors are counting on the BOM characters to tell if it is UTF-8. The way to clearly see this is with a binary tool like WinHex (www.winhex.com). Since I was looking for a before and after difference I used the Microsoft WinDiff application.

Fervidor answered 24/3, 2016 at 13:49 Comment(0)
H
0

For VB.Net visual basic, this is how to make it work:

My.Computer.FileSystem.WriteAllText("FileName", Data, False, System.Text.Encoding.ASCII)
Huntley answered 10/11, 2021 at 16:31 Comment(0)
E
-1

It might be that your input text contains a byte order mark. In that case, you should remove it before writing.

Erasure answered 13/3, 2010 at 8:52 Comment(2)
Please assist me. How to remove it before writting.Dionysian
@Erasure doesn’t the default reader already filter that out for you?Pomfret
C
-1
Dim sWriter As IO.StreamWriter = New IO.StreamWriter(shareworklist & "\" & getfilename() & ".txt", False, Encoding.Default)

Gives you results as those you want(I think).

Chastitychasuble answered 22/12, 2011 at 5:41 Comment(1)
On my PC it creates ANSI filesYurt

© 2022 - 2024 — McMap. All rights reserved.