I have a string in Python like this:
u'\u200cHealth & Fitness'
How can i remove the
\u200c
part from the string ?
I have a string in Python like this:
u'\u200cHealth & Fitness'
How can i remove the
\u200c
part from the string ?
You can encode it into ascii
and ignore errors:
u'\u200cHealth & Fitness'.encode('ascii', 'ignore')
Output:
'Health & Fitness'
If you have a string that contains Unicode
character, like
s = "Airports Council International \u2013 North America"
then you can try:
newString = (s.encode('ascii', 'ignore')).decode("utf-8")
and the output will be:
Airports Council International North America
Upvote if helps :)
list_text_fixed = [(s.encode('ascii', 'ignore')).decode("utf-8") for s in list_text]
–
Trager I just use replace because I don't need it:
varstring.replace('\u200c', '')
Or in your case:
u'\u200cHealth & Fitness'.replace('\u200c', '')
strip()
ignores. In most cases with unicode strs you do not want to encode(ascii, ignore)
. –
Marc for me the following worked
mystring.encode('ascii', 'ignore').decode('unicode_escape')
In the specific case in the question: that the string is prefixed with a single u'\200c'
character, the solution is as simple as taking a slice that does not include the first character.
original = u'\u200cHealth & Fitness'
fixed = original[1:]
If the leading character may or may not be present, str.lstrip may be used
original = u'\u200cHealth & Fitness'
fixed = original.lstrip(u'\u200c')
The same solutions will work in Python3. From Python 3.9, str.removeprefix is also available
original = u'\u200cHealth & Fitness'
fixed = original.removeprefix(u'\u200c')
If the Text is just English, this way
u'\u200cHealth & Fitness'.encode('ascii', 'ignore')
BUT if such as Arabic, Persian ,... this way:
s=s.replace('\\', '').replace('u200c', '')
If you're going to write a Text file:
import codecs
with codecs.open('text_file.txt', 'w', encoding='utf-8') as text_file:
for line in array_string:
text_file.write('\u200c' + line + '\n')
© 2022 - 2024 — McMap. All rights reserved.
s.encode('utf-8')
– Bolding\xe2\x80\x8cHealth & Fitness
– Doityourselfascii
as Arount answered below – Bolding