Removing unicode \u2026 like characters in a string in python2.7 [duplicate]
Asked Answered
R

1

42

I have a string in python2.7 like this,

 This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!

How do i convert it to this,

This is some text that has to be cleaned! its annoying!
Recursion answered 10/3, 2013 at 10:17 Comment(2)
Based on what do you want to filter the characters? Do you only want to preserve ASCII?Incept
@root, yes I just want to preserve the asciiRecursion
S
89

Python 2.x

>>> s
'This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!'
>>> print(s.decode('unicode_escape').encode('ascii','ignore'))
This is some  text that has to be cleaned! it's annoying!

Python 3.x

>>> s = 'This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!'
>>> s.encode('ascii', 'ignore')
b"This is some  text that has to be cleaned! it's annoying!"
Sacha answered 10/3, 2013 at 10:26 Comment(7)
That's just how its printedSacha
also curious why you wrote \` but not `, isn't `` outputs the same thing with your code ?Altis
This also strips characters such as ü, ä, ö etc, which is not desired in most of the cases.Garget
@BurhanKhalid, in your code is 's' a string? If yes, then do i need to import some package to get decode and encode helper methods?Craw
When i tried your code on my string, i get an exception which says 'str' object has not attribute 'decode'. I am using Python 3.6.6 versionCraw
Please read this and this.Sacha
This isn't working for me...The string is exactly the same.Towardly

© 2022 - 2024 — McMap. All rights reserved.