How do I remove ï»¿ from the beginning of a file?

D

23

170

I have a CSS file that looks fine when I open it using gedit, but when it's read by PHP (to merge all the CSS files into one), this CSS has the following characters prepended to it: ï»¿

PHP removes all whitespace, so a random ï»¿ in the middle of the code messes up the entire thing. As I mentioned, I can't actually see these characters when I open the file in gedit, so I can't remove them very easily.

I googled the problem, and there is clearly something wrong with the file encoding, which makes sense being as I've been shifting the files around to different Linux/Windows servers via ftp and rsync, with a range of text editors. I don't really know much about character encoding though, so help would be appreciated.

If it helps, the file is being saved in UTF-8 format, and gedit won't let me save it in ISO-8859-15 format (the document contains one or more characters that cannot be encoded using the specified character encoding). I tried saving it with Windows and Linux line endings, but neither helped.

Dodds answered 15/7, 2010 at 13:35 Comment(3)

This appears to solve the problem. 95isalive.com/expression/index.html – Seineetmarne 5/9, 2011 at 9:46

Somebody strip us off the BOM – Gaptoothed 5/9, 2011 at 9:46

https://mcmap.net/q/145099/-how-do-i-remove-the-character-quot-239-187-191-quot-from-the-beginning-of-a-text-file-in-c/995714 – Coffeepot 30/9, 2015 at 9:52

U

168

Three words for you:

Byte Order Mark (BOM)

That's the representation for the UTF-8 BOM in ISO-8859-1. You have to tell your editor to not use BOMs or use a different editor to strip them out.

To automatize the BOM's removal you can use awk as shown in this question.

As another answer says, the best would be for PHP to actually interpret the BOM correctly, for that you can use mb_internal_encoding(), like this:

 <?php
   //Storing the previous encoding in case you have some other piece 
   //of code sensitive to encoding and counting on the default value.      
   $previous_encoding = mb_internal_encoding();

   //Set the encoding to UTF-8, so when reading files it ignores the BOM       
   mb_internal_encoding('UTF-8');

   //Process the CSS files...

   //Finally, return to the previous encoding
   mb_internal_encoding($previous_encoding);

   //Rest of the code...
  ?>

Unsuccess answered 15/7, 2010 at 13:37 Comment(8)

Yeah I found that when I googled it, but how do I remove them? – Dodds 15/7, 2010 at 13:38

It doesn't remove the BOM, it ignores it. – Jochebed 23/6, 2013 at 22:19

Or the other way(ignore) could be change the encoding. – Nitrobenzene 21/10, 2015 at 3:20

Windows Notepad (ugh) adds them; suggestion from a dup of this question is to use Notepad++, which allows setting "UTF-8 without BOM" as an encoding. Or use a Real Editor... (emacs!) :-) – Haemocyte 12/2, 2016 at 15:26

Good, now i'm problems when upgrade php 5.4 to 5.6 error font. Maybe BOM can is the problems – Resin 17/1, 2017 at 7:12

My understanding is that the UTF-8 BOM are the hex bytes EF BB BF, however ï»¿ is C3 AF C2 BB C2 BF—so your answer doesn't make sense in that regard. – Chainplate 7/10, 2017 at 17:13

That's exactly the issue, different character encodings use different bytes for the same characters. Read again the third paragraph of the answer. – Unsuccess 7/10, 2017 at 17:18

Thanks for the response. It was because my text editor was in UTF-8 mode and must have changed those characters' encoding when I pasted them into it. Doesn't do that if I first put the editor is IS0 8859-1 Latin 1 encoding mode. – Chainplate 7/10, 2017 at 17:25

C

32

Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 without BOM, save the file, replace the old file with this new file. And it will work, damn sure.

Carabao answered 18/12, 2014 at 10:50 Comment(1)

In Notepad++ v7.6.6 (64-bit) you need to click Convert to UTF-8. – Arvo 15/5, 2019 at 7:5

C

28

In PHP, you can do the following to remove all non characters including the character in question.

$response = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $response);

Countenance answered 19/12, 2013 at 21:51 Comment(2)

in case you just want to kill the "ï" use this $response = preg_replace('/[\x80-\xFF]//', '', $response); – Sequin 8/6, 2017 at 20:49

@guido_nhcol.com.br_ You add an extra /, it should be: $response = preg_replace('/[\x80-\xFF]/', '', $response); – Unlike 16/7, 2019 at 7:41

P

20

For those with shell access here is a little command to find all files with the BOM set in the public_html directory - be sure to change it to what your correct path on your server is

Code:

grep -rl $'\xEF\xBB\xBF' /home/username/public_html

and if you are comfortable with the vi editor, open the file in vi:

vi /path-to-file-name/file.php

And enter the command to remove the BOM:

set nobomb

Save the file:

wq

Pallmall answered 15/7, 2013 at 13:3 Comment(1)

Use grep -rlI $'\xEF\xBB\xBF' . to ignore binary files. – Diacritic 11/3, 2015 at 16:56

F

11

BOM is just a sequence of characters ($EF $BB $BF for UTF-8), so just remove them using scripts or configure the editor so it's not added.

From Removing BOM from UTF-8:

#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

I am sure it translates to PHP easily.

Flitting answered 15/7, 2010 at 13:55 Comment(3)

Note that the BOM is not a sequence of characters, it is a single character. If the file is in UTF-8, then the character is represented in three bytes. If the file is in UTF-8, then viewing it in another encoding (i.e., one in which EF BB BF appears where the BOM should be) is an error. To remove the BOM from a UTF-8 file, one should remove the (single) charcter U+FEFF. Yeah, pedantry! – Oread 15/7, 2010 at 14:5

I couldn't get that working in PHP (that's just my incompetence, not yours :P), so I did a check to see if the BOM is there and remove the first 3 characters. Here's the code, if anyone needs it: if( substr($css, 0,3) == pack("CCC",0xef,0xbb,0xbf) ) { $css = substr($css, 3); } – Dodds 15/7, 2010 at 14:8

it translates to php as $string = preg_replace('/\x{EF}\x{BB}\x{BF}/','',$string); . before you use this, reconsider if you can't fix the problem at the source instead. – Chretien 6/10, 2011 at 15:53

O

6

I don't know PHP, so I don't know if this is possible, but the best solution would be to read the file as UTF-8 rather than some other encoding. The BOM is actually a ZERO WIDTH NO BREAK SPACE. This is whitespace, so if the file were being read in the correct encoding (UTF-8), then the BOM would be interpreted as whitespace and it would be ignored in the resulting CSS file.

Also, another advantage of reading the file in the correct encoding is that you don't have to worry about characters being misinterpreted. Your editor is telling you that the code page you want to save it in won't do all the characters that you need. If PHP is then reading the file in the incorrect encoding, then it is very likely that other characters besides the BOM are being silently misinterpreted. Use UTF-8 everywhere, and these problems disappear.

Oread answered 15/7, 2010 at 13:48 Comment(0)

P

6

For me, this worked:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

If I remove this meta, the ï»¿ appears again. Hope this helps someone...

Prefix answered 12/11, 2014 at 18:53 Comment(0)

M

3

You can use

vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Replacing with awk seems to work, but it is not in place.

Macias answered 12/4, 2012 at 7:28 Comment(0)

S

2

I had the same problem with the BOM appearing in some of my PHP files (ï»¿ï»¿).

If you use PhpStorm you can set at hotkey to remove it in Settings -> IDE Settings -> Keymap -> Main Menu - > File -> Remove BOM.

Stockish answered 8/3, 2013 at 14:45 Comment(0)

I

2

grep -rl $'\xEF\xBB\xBF' * | xargs vim -e -c 'argdo set fileencoding=utf-8|set encoding=utf-8| set nobomb| wq'

Indecision answered 29/11, 2013 at 14:13 Comment(1)

Use grep -rlI $'\xEF\xBB\xBF' . to ignore binary files. And also . better then * here. – Diacritic 11/3, 2015 at 16:59

B

2

In Notepad++, choose the "Encoding" menu, then "Encode in UTF-8 without BOM". Then save.

See Stack Overflow question How to make Notepad to save text in UTF-8 without BOM?.

Blaspheme answered 14/7, 2014 at 16:41 Comment(0)

E

2

Open the PHP file under question, in Notepad++.

Click on Encoding at the top and change from "Encoding in UTF-8 without BOM" to just "Encoding in UTF-8". Save and overwrite the file on your server.

Ellersick answered 21/10, 2015 at 6:55 Comment(0)

C

1

If you need to be able to remove the BOM from UTF-8 encoded files, you first need to get hold of an editor that is aware of them.

I personally use E Text Editor.

In the bottom right, there are options for character encoding, including the BOM tag. Load your file, deselect Byte Order Marker if it is selected, resave, and it should be done.

Alt text http://oth4.com/encoding.png

E is not free, but there is a free trial, and it is an excellent editor (limited TextMate compatibility).

Cloistral answered 15/7, 2010 at 13:42 Comment(1)

The image link is broken. – Cetology 9/5, 2015 at 16:53

P

1

Same problem, different solution.

One line in the PHP file was printing out XML headers (which use the same begin/end tags as PHP). Looks like the code within these tags set the encoding, and was executed within PHP which resulted in the strange characters. Either way here's the solution:

# Original
$xml_string = "&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;";

# fixed
$xml_string = "<" . "?xml version=\"1.0\" encoding=\"UTF-8\"?" . ">";

Psychro answered 8/9, 2011 at 15:28 Comment(0)

T

1

You can open it by PhpStorm and right-click on your file and click on Remove BOM...

Tiflis answered 18/8, 2013 at 19:53 Comment(0)

G

1

Here is another good solution for the problem with BOM. These are two VBScript (.vbs) scripts.

One for finding the BOM in a file and one for KILLING the damned BOM in the file. It works pretty fine and is easy to use.

Just create a .vbs file, and paste the following code in it.

You can use the VBScript script simply by dragging and dropping the suspicious file onto the .vbs file. It will tell you if there is a BOM or not.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' find_BOM.vbs
' ====================
' Kleines Hilfsmittel, welches das BOM finden soll
'
 Const UTF8_BOM = "ï»¿"
 Const UTF16BE_BOM = "þÿ"
 Const UTF16LE_BOM = "ÿþ"
 Const ForReading = 1
 Const ForWriting = 2
 Dim fso
 Set fso = WScript.CreateObject("Scripting.FileSystemObject")
 Dim f
 f = WScript.Arguments.Item(0)
 Dim t
 t = fso.OpenTextFile(f, ForReading).ReadAll
 If Left(t, 3) = UTF8_BOM Then
     MsgBox "UTF-8-BOM detected!"
 ElseIf Left(t, 2) = UTF16BE_BOM Then
     MsgBox "UTF-16-BOM (Big Endian) detected!"
 ElseIf Left(t, 2) = UTF16LE_BOM Then
     MsgBox "UTF-16-BOM (Little Endian) detected!"
 Else
     MsgBox "No BOM detected!"
 End If

If it tells you there is BOM, go and create the second .vbs file with the following code and drag the suspicios file onto the .vbs file.

' Heiko Jendreck - personal helpdesk & webdesign
' http://www.phw-jendreck.de
' 2010.05.10 Vers 1.0
'
' kill_BOM.vbs
' ====================
' Kleines Hilfmittel, welches das gefundene BOM löschen soll
'
Const UTF8_BOM = "ï»¿"
Const ForReading = 1
Const ForWriting = 2
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
Dim f
f = WScript.Arguments.Item(0)
Dim t
t = fso.OpenTextFile(f, ForReading).ReadAll
If Left(t, 3) = UTF8_BOM Then
    fso.OpenTextFile(f, ForWriting).Write (Mid(t, 4))
    MsgBox "BOM gelöscht!"
Else
    MsgBox "Kein UTF-8-BOM vorhanden!"
End If

The code is from Heiko Jendreck.

Grayback answered 22/8, 2014 at 13:53 Comment(0)

W

1

In PHPStorm, for multiple files and BOM not necessarily at the beginning of the file, you can search \x{FEFF} (Regular Expression) and replace with nothing.

Willms answered 22/2, 2018 at 3:43 Comment(0)

W

0

Use Total Commander to search for all BOMed files:

Elegant way to search for UTF-8 files with BOM?

Open these files in some proper editor (that recognizes BOM) like Eclipse.
Change the file's encoding to ISO (right click, properties).
Cut ï»¿ from the beginning of the file, save
Change the file's encoding back to UTF-8

...and do not even think about using n...d again!

Worms answered 19/9, 2011 at 23:28 Comment(0)

I

0

Same problem, but it only affected one file so I just created a blank file, copy/pasted the code from the original file to the new file, and then replaced the original file. Not fancy but it worked.

Inclinable answered 30/4, 2014 at 20:39 Comment(0)

V

0

I had the same problem. The problem was because one of my php files was in utf-8 (the most important, the configuaration file which is included in all php files).

In my case, I had 2 different solutions which worked for me :

First, I changed the Apache Configuration by using AddDefaultCharsetDirective in configuration files (or in .htaccess). This solution forces Apache to use the correct encodage.

AddDefaultCharset ISO-8859-1

The second solution was to change the bad encoding of the php file.

Variation answered 11/2, 2016 at 7:59 Comment(0)

D

0

Copy the text of your filename.css file.
Close your css file.
Rename it filename2.css to avoid a filename clash.
In MS Notepad or Wordpad, create a new file.
Paste the text into it.
Save it as filename.css, selecting UTF-8 from the encoding options.
Upload filename.css.

Dishabille answered 12/12, 2017 at 18:42 Comment(0)

I

0

This works for me!

def removeBOMs(fileName):
     BOMs = ['ï»¿',#Bytes as CP1252 characters
    'þÿ',
    'ÿþ',
    '^@^@þÿ',
    'ÿþ^@^@',
    '+/v',
    '÷dL',
    'Ýsfs',
    'Ýsfs',
    '^Nþÿ',
    'ûî(',
    '„1•3']
     inputFile = open(fileName, 'r')
     contents = inputFile.read()
     for BOM in BOMs:
         if not BOM in contents:#no BOM in the file...
             pass
         else:
             newContents = contents.replace(BOM,'', 1)
             newFile = open(fileName, 'w')
             newFile.write(newContents)
             return None

Isonomy answered 19/10, 2020 at 17:14 Comment(0)

C

-3

Check on your index.php, find "... charset=iso-8859-1" and replace it with "... charset=utf-8".

Maybe it'll work.

Colossians answered 14/4, 2013 at 19:25 Comment(0)

Recommended topics

Hot tags