How to check if string is 100% ascii in python 3
Asked Answered
L

3

3

i have two strings

eng = "Clash of Clans – Android Apps on Google Play"
rus = "Castle Clash: Новая Эра - Android Apps on Google Play"

and now i want to check whether string is in English or not by using Python 3.

I have read this Stackoverflow answer here and it does not help me as its for Python 2.x solution but in comments some one mention that use

string.encode('ascii')

to make it work in Python 3.x but my problem is, in both cases it raises same UnicodeEncodeError exception!

Screenshot: enter image description here

so now i am stuck here and cant figure out how to make it work! kindly guide me or i have to use another method to determine if String is in English or not! Thanks

Lepper answered 7/10, 2015 at 23:21 Comment(0)
L
5

As with Salvador Dali's answer you linked to, you must use a try-catch block to check for an error in encoding.

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode('ascii')
    except UnicodeEncodeError:
        return False
    else:
        return True

Just to note though, when I copy and pasted your eng and rus strings to try them, they both came up as False. Retyping the English one returned True, so I'm not sure what's up with that.

Landsturm answered 8/10, 2015 at 0:20 Comment(2)
what do u mean by retyping??Lepper
@Lepper it means typing the string in rather than using copy/paste. An English keyboard only has ASCII symbols on it so you won't accidentally get the EN DASH that your string contains.Accumulator
A
3

Your English string really isn't true ASCII, it contains the character U+2013 - EN DASH. This looks very similar to the ASCII dash U+002d but it is different.

If this is the only character you need to worry about, you can do a simple replacement to make it work:

>>> eng.replace('\u2013', '-').encode('ascii')
b'Clash of Clans - Android Apps on Google Play'
Accumulator answered 8/10, 2015 at 2:9 Comment(2)
ohhh @Mark Ransom but my main target is to achieve if string is in english or not how can i achieve this??Lepper
@Lepper use a combination of this answer and Hayley's answer.Accumulator
M
3

You can use the isascii() method:

>>> rus.isascii()
False
Module answered 15/5, 2019 at 8:58 Comment(1)
New in python 3.7 - not beforeExplant

© 2022 - 2024 — McMap. All rights reserved.