Check if a string is hexadecimal
Asked Answered
M

14

72

I know the easiest way is using a regular expression, but I wonder if there are other ways to do this check.

Why do I need this? I am writing a Python script that reads text messages (SMS) from a SIM card. In some situations, hex messages arrives and I need to do some processing for them, so I need to check if a received message is hexadecimal.

When I send following SMS:

Hello world!

And my script receives

00480065006C006C006F00200077006F0072006C00640021

But in some situations, I receive normal text messages (not hex). So I need to do a if hex control.

I am using Python 2.6.5.

UPDATE:

The reason of that problem is, (somehow) messages I sent are received as hex while messages sent by operator (info messages and ads.) are received as a normal string. So I decided to make a check and ensure that I have the message in the correct string format.

Some extra details: I am using a Huawei 3G modem and PyHumod to read data from the SIM card.

Possible best solution to my situation:

The best way to handle such strings is using a2b_hex (a.k.a. unhexlify) and utf-16 big endian encoding (as @JonasWielicki mentioned):

from binascii import unhexlify  # unhexlify is another name of a2b_hex

mystr = "00480065006C006C006F00200077006F0072006C00640021"
unhexlify(mystr).encode("utf-16-be")
>> u'Hello world!'
Meli answered 21/7, 2012 at 12:39 Comment(4)
I don't think the problem is easy as it's look, how about if you read something like "333 445", it can be for example a phone number (string) or a hexadecimal value, how can you be sure of that ? I think the real question will be why are you reading both ?Precancel
@Precancel that is a problem itself, but in my situation i do not bother that.Meli
By the way, the expanded hex code looks pretty much like UCS-2 big endian encoding.Radon
I would think that maybe a regular expression wiz (which I'm not) could do the check with an RE.Tootsie
S
136

(1) Using int() works nicely for this, and Python does all the checking for you :)

int('00480065006C006C006F00200077006F0072006C00640021', 16)
6896377547970387516320582441726837832153446723333914657L

will work. In case of failure you will receive a ValueError exception.

Short example:

int('af', 16)
175

int('ah', 16)
 ...
ValueError: invalid literal for int() with base 16: 'ah'

(2) An alternative would be to traverse the data and make sure all characters fall within the range of 0..9 and a-f/A-F. string.hexdigits ('0123456789abcdefABCDEF') is useful for this as it contains both upper and lower case digits.

import string
all(c in string.hexdigits for c in s)

will return either True or False based on the validity of your data in string s.

Short example:

s = 'af'
all(c in string.hexdigits for c in s)
True

s = 'ah'
all(c in string.hexdigits for c in s)
False

Notes:

As @ScottGriffiths notes correctly in a comment below, the int() approach will work if your string contains 0x at the start, while the character-by-character check will fail with this. Also, checking against a set of characters is faster than a string of characters, but it is doubtful this will matter with short SMS strings, unless you process many (many!) of them in sequence in which case you could convert stringhexditigs to a set with set(string.hexdigits).

Sixfold answered 21/7, 2012 at 12:41 Comment(4)
A minor quibble is that the two methods aren't quite equivalent (and the same goes for eumiro's answer). For strings starting with 0x or 0X casting to an int will succeed but the other method won't.Science
@ScottGriffiths Good point, I'll add a note to my answer just in case, though for the data shown by OP as sample input, the solutions work. ThanksSixfold
1st may be may be error prompt to use at some places due to negative int('-a', 16) -> 10 don't raise ValueError where - an operator should not be consider the part of hexadecimal string.Zoo
The all test will return True on a empty string.Keeleykeelhaul
W
29

You can:

  1. test whether the string contains only hexadecimal digits (0…9,A…F)
  2. try to convert the string to integer and see whether it fails.

Here is the code:

import string
def is_hex(s):
     hex_digits = set(string.hexdigits)
     # if s is long, then it is faster to check against a set
     return all(c in hex_digits for c in s)

def is_hex(s):
    try:
        int(s, 16)
        return True
    except ValueError:
        return False
Whalen answered 21/7, 2012 at 12:43 Comment(12)
Wondering whether it would be faster to add ABCDEF to the test string instead of copying the whole string for the .lower() operation.Radon
@JonasWielicki inputs may have both upper and lower caseJecho
thats why I'd suggest adding ABCDEF to the test string, in addition to abcdef.Radon
return all(c.lower() in '0123456789abcdef' for c in s) is more faster than return all(c in '0123456789abcdef' for c in s.lower())Jecho
and the problem is that your method returns true for decimal numbersJecho
@Jecho - '7890' is both decimal and hexadecimal number, just like '1010' can be binary, octal, decimal, hexadecimal and whatever number…Whalen
@Whalen yes, but you can check the range of numbers of your input, probability of a decimal number contain only 0 or 1 is very lowJecho
write the 0123456789abcdefABCDEF unordered, and shuffle it, it has less time complexity to searchJecho
In other words randomize your algorithmJecho
@Pooya: You got it backwards when you said "return all(c.lower() in '0123456789abcdef' for c in s) is more faster than return all(c in '0123456789abcdef' for c in s.lower())". If you need c.lower(), you will call lower() many times, whereas if you do s.lower(), you only call lower() once. Of course, I think it is even better to avoid lower() altogether, and follow Jonas's suggestion (which has been edited into the answer).Pentha
Wouldn't it be faster to test if ord(c) is between ord ('0') and ord ('9') or between a and f or between A and F for all chars in s?Phototopography
The first functions converts hexdigits to a set on each and every call. Wonder how much of the speedup you expect by using a set is lost in that.Unfasten
D
28

I know the op mentioned regular expressions, but I wanted to contribute such a solution for completeness' sake:

def is_hex(s):
    return re.fullmatch(r"^[0-9a-fA-F]+$", s or "") is not None

Performance

In order to evaluate the performance of the different solutions proposed here, I used Python's timeit module. The input strings are generated randomly for three different lengths, 10, 100, 1000:

s=''.join(random.choice('0123456789abcdef') for _ in range(10))

Levon's solutions:

# int(s, 16)
  10: 0.257451018987922
 100: 0.40081690801889636
1000: 1.8926858339982573

# all(_ in string.hexdigits for _ in s)
  10:  1.2884491360164247
 100: 10.047717947978526
1000: 94.35805322701344

Other answers are variations of these two. Using a regular expression:

# re.fullmatch(r'^[0-9a-fA-F]+$', s or '')
  10: 0.725040541990893
 100: 0.7184272820013575
1000: 0.7190397029917222

Picking the right solution thus depends on the length on the input string and whether exceptions can be handled safely. The regular expression certainly handles large strings much faster (and won't throw a ValueError on overflow), but int() is the winner for shorter strings.

Edit: + added to regex

Dolorisdolorita answered 14/12, 2015 at 6:35 Comment(3)
Because fullmatch is not available in Python 2.7 you can use return re.search(r'^[0-9A-Fa-f]+$', s) is not NoneUnkennel
The conclusions in this answer are somewhat wrong. It should be using re.fullmatch(r'[0-9a-fA-F]+', s or ''), with the + quantifier. If you use this, then int(s, 16) is fastest for all string lengths in my testing (on Python 3.6). However, a regex is still probably the better option as int(s, 16) accepts strings such as "0x0".Myotonia
If the regular expression is changed to '^(0[xX])?[0-9a-fA-F]+$', the issue of handling a possible 0x prefix is addressed.Parrotfish
K
14

One more simple and short solution based on transformation of string to set and checking for subset (doesn't check for '0x' prefix):

import string
def is_hex_str(s):
    return set(s).issubset(string.hexdigits)

More information here.

Kenspeckle answered 27/12, 2018 at 10:20 Comment(1)
seems great to me, and is supported all the way to 3.10Tommyetommyrot
R
4

Another option:

def is_hex(s):
    hex_digits = set("0123456789abcdef")
    for char in s:
        if not (char in hex_digits):
            return False
    return True
Rev answered 24/1, 2013 at 18:48 Comment(0)
Z
2

Most of the solutions proposed above do not take into account that any decimal integer may be also decoded as hex because decimal digits set is a subset of hex digits set. So Python will happily take 123 and assume it's 0123 hex:

>>> int('123',16)
291

This may sound obvious but in most cases you'll be looking for something that was actually hex-encoded, e.g. a hash and not anything that can be hex-decoded. So probably a more robust solution should also check for an even length of the hex string:

In [1]: def is_hex(s):
   ...:     try:
   ...:         int(s, 16)
   ...:     except ValueError:
   ...:         return False
   ...:     return len(s) % 2 == 0
   ...: 

In [2]: is_hex('123')
Out[2]: False

In [3]: is_hex('f123')
Out[3]: True
Zug answered 29/4, 2017 at 19:42 Comment(0)
C
0

Using Python you are looking to determine True or False, I would use eumero's is_hex method over Levon's method one. The following code contains a gotcha...

if int(input_string, 16):
    print 'it is hex'
else:
    print 'it is not hex'

It incorrectly reports the string '00' as not hex because zero evaluates to False.

Centavo answered 13/5, 2014 at 20:9 Comment(0)
O
0

In Python3, I tried:

def is_hex(s):
    try:
        tmp=bytes.fromhex(hex_data).decode('utf-8')
        return ''.join([i for i in tmp if i.isprintable()])
    except ValueError:
        return ''

It should be better than the way: int(x, 16)

Odilia answered 4/12, 2014 at 15:27 Comment(1)
Why would this be better than int(s, 16)? Your function takes an s parameter and doesn't use it (I assume it's supposed to be hex_data). It also calls decode() which fails for every incorrect UTF8 encoded Unicode character—and there are many of them considering random hex input. What's the purpose of isprintable()?Dolorisdolorita
M
0

This will cover the case if the string starts with '0x' or '0X': [0x|0X][0-9a-fA-F]

d='0X12a'
all(c in 'xX' + string.hexdigits for c in d)
True
Motel answered 16/4, 2018 at 21:10 Comment(1)
Actually, "xxxXXX" or "123xXxABC" are not a hex string, but your expression above would return True.Dolorisdolorita
F
0

Since all the regular expression above took about the same amount of time, I would guess that most of the time was related to converting the string to a regular expression. Below is the data I got when pre-compiling the regular expression.

int_hex  
0.000800 ms 10  
0.001300 ms 100  
0.008200 ms 1000  

all_hex  
0.003500 ms 10  
0.015200 ms 100  
0.112000 ms 1000  

fullmatch_hex  
0.001800 ms 10  
0.001200 ms 100  
0.005500 ms 1000
Fula answered 10/7, 2019 at 22:43 Comment(0)
T
0

Simple solution in case you need a pattern to validate prefixed hex or binary along with decimal

\b(0x[\da-fA-F]+|[\d]+|0b[01]+)\b

Sample: https://regex101.com/r/cN4yW7/14

Then doing int('0x00480065006C006C006F00200077006F0072006C00640021', 0) in python gives 6896377547970387516320582441726837832153446723333914657

The base 0 invokes prefix guessing behaviour. This has saved me a lot of hassle. Hope it helps!

Triclinium answered 22/5, 2020 at 22:6 Comment(0)
E
0

Here's my solution:

def to_decimal(s):
    '''input should be int10 or hex'''
    isString = isinstance(s, str)
    if isString:
        isHex = all(c in string.hexdigits + 'xX' for c in s)
        return int(s, 16) if isHex else int(s)
    else:
        return int(hex(s), 16)

a = to_decimal(12)
b = to_decimal(0x10)
c = to_decimal('12')
d = to_decimal('0x10')
print(a, b, c, d)
Eckblad answered 28/1, 2022 at 5:6 Comment(1)
This is a buggy solution. All of the following will be evaluate isHex as true according to your code: x, 1231x12, xxxMeli
I
0

I'm quite surprised nobody mentioned str.removeprefix.

In this manner, any valid hexadecimal string will pass the test, whereas any invalid one will not.

Edit: An empty string check has been added. (Thanks to @daviid)

test_cases = [
    '0x123abc',
    '13a53d',
    '0xA32c0F',
    '3B3F9d',
]

for to_test in test_cases:
    assert set() < set(to_test.lower().removeprefix('0x')) <= set('0123456789abcdef')
Irrefrangible answered 6/11, 2023 at 8:24 Comment(4)
Well, it was added on Python 3.9 (that's the 5th of October 2020) and there's this answer which is similar though. To shorten your answer with the linked answer return set(to_test.removeprefix('0x')).issubset(string.hexdigits), there's no need for lower() or converting to string.hexdigits to set. Awhopping 6 bytes saved (4 if ignoreing white spaces).Doctrine
I added .lower() to remove both 0x and 0X. to_test.removeprefix('0x').removeprefix('0X') is not sufficient because a string 0x0Xabc can pass the test.Irrefrangible
to_test = '0x' should fail right?Doctrine
Oh, I overlooked it. The code should be assert set() < set(to_test.lower().removeprefix('0x')) <= set(string.hexdigits). Thanks for pointing it out.Irrefrangible
C
-1

Most of the solution are not properly in checking string with prefix 0x

>>> is_hex_string("0xaaa")  
False  
>>> is_hex_string("0x123")  
False  
>>> is_hex_string("0xfff")  
False  
>>> is_hex_string("fff")  
True  
Countryandwestern answered 25/8, 2021 at 3:19 Comment(1)
is_hex_string is not a function and this answer does not answer or solve the OP question. I would recommend you add this as a comment on the OP question or write a solution to the question.Bernt

© 2022 - 2024 — McMap. All rights reserved.