\ufeff
is a the ZERO WIDTH NO-BREAK SPACE codepoint; it is not rendered when printing. It is used as a byte order mark in UTF-16 and UTF-32 to record the order in which the encoded bytes are to be decoded (big-endian or little-endian).
UTF-8 doesn't need a BOM (it only has one fixed ordering of the bytes, no need to track an alternative), but Microsoft decided it was a handy signature character for their tools to detect UTF-8 files vs. 8-bit encodings (such as most of the windows codepages employ).
I suspect you are using a Microsoft text editor such as Notepad to save your code. Don't do this, it'll include the BOM but Python doesn't support it or strip it from UTF-8 source files. You probably saved the file with Notepad, then continued with a different tool to add more code to the start and the BOM got caught in the middle.
Either delete the whole line and the next and re-type them, or select from the closing quote of the string you define until just before the h
of headers
on the next line, delete that part and re-insert a newline and enough indentation.
If your editor supports using escape sequences when searching and replacing (SublimeText does when in regex mode, for example), you could just use that to search for the character and replace it with an empty string. In SublimeText, switch on regex support and search for \x{feff}
, replacing those occurrences with an empty string.
The Python utf-8-sig
encoding that you are using here also includes that BOM:
headers['User-Agent'] = usag.encode('utf-8-sig')
HTTP headers should not include that codepoint either. HTTP headers typically stick to Latin-1 instead; even ASCII would suffice here, but otherwise use 'utf-8'
(no -sig
).
You don't really need to use str.encode()
there, you could also just define a bytestring:
headers = {}
usag = b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0'
headers['User-Agent'] = usag
Note the b
prefix to the string literal.
[-]
menu on your CMD window, it has options there to select and copy. – Mazza