I am working with external data that's encoded in latin1. So I've add sitecustomize.py
and in it added
sys.setdefaultencoding('latin_1')
sure enough, now working with latin1 strings works fine.
But, in case I encounter something that is not encoded in latin1:
s=str(u'abc\u2013')
I get UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2013' in position 3: ordinal not in range(256)
What I would like is that the undecodable chars would simply be ignored, i.e I would get that in the above example s=='abc?'
, and do that without explicitly calling decode()
or encode
each time, i.e not s.decode(...,'replace') on each call.
I tried doing different things with codecs.register_error
but to no avail.
please help?
s=str(u'abc\u2013')
, you want to work in unicode, which seems weird if you set default encoding tolatin-1
– Mary