I'm using Python 3.7. How do I remove all non-UTF-8 characters from a string? I tried using "lambda x: x.decode('utf-8','ignore').encode("utf-8")" in the below
coop_types = map(
lambda x: x.decode('utf-8','ignore').encode("utf-8"),
filter(None, set(d['type'] for d in input_file))
)
but this is resulting in the error ...
Traceback (most recent call last):
File "scripts/parse_coop_csv.py", line 30, in <module>
for coop_type in coop_types:
File "scripts/parse_coop_csv.py", line 25, in <lambda>
lambda x: x.decode('utf-8','ignore').encode("utf-8"),
AttributeError: 'str' object has no attribute 'decode'
If you have a generic way to remove all non-UTF8 chars from a string, that's all I'm looking for.
x
, then decode it.str.encode
takes a Unicode string and produces a UTF-8 encoding of it.bytes.decode
takes a string and attempts interpret it as an encoding to produce astr
object. – Chestr
? Do you mean surrogate code points? – Amylase