Convert string to ASCII value python
Asked Answered
P

10

100

How would you convert a string to ASCII values?

For example, "hi" would return [104 105].

I can individually do ord('h') and ord('i'), but it's going to be troublesome when there are a lot of letters.

Periodic answered 9/12, 2011 at 23:23 Comment(0)
K
150

You can use a list comprehension:

>>> s = 'hi'
>>> [ord(c) for c in s]
[104, 105]
Kleist answered 9/12, 2011 at 23:28 Comment(0)
C
33

Here is a pretty concise way to perform the concatenation:

>>> s = "hello world"
>>> ''.join(str(ord(c)) for c in s)
'10410110810811132119111114108100'

And a sort of fun alternative:

>>> '%d'*len(s) % tuple(map(ord, s))
'10410110810811132119111114108100'
Cultivar answered 9/12, 2011 at 23:37 Comment(2)
What was I thinking? This is much more pythonic than mine. That's what I get for trying to answer a python question right after reading a bunch of Haskell questions... +1Tidbit
Good example. Thank you for sharingRidenhour
C
16

In 2021 we can assume only Python 3 is relevant, so...

If your input is bytes:

>>> list(b"Hello")
[72, 101, 108, 108, 111]

If your input is str:

>>> list("Hello".encode('ascii'))
[72, 101, 108, 108, 111]

If you want a single solution that works with both:

list(bytes(text, 'ascii'))

(all the above will intentionally raise UnicodeEncodeError if str contains non-ASCII chars. A fair assumption as it makes no sense to ask for the "ASCII value" of non-ASCII chars.)

Cinthiacintron answered 13/4, 2021 at 3:51 Comment(0)
D
9

If you are using python 3 or above,

>>> list(bytes(b'test'))
[116, 101, 115, 116]
Devland answered 5/5, 2018 at 7:50 Comment(1)
A great approach, but bytes() is redundant for a bytes input, and for a string input you need to specify an encoding.Cinthiacintron
T
7

If you want your result concatenated, as you show in your question, you could try something like:

>>> reduce(lambda x, y: str(x)+str(y), map(ord,"hello world"))
'10410110810811132119111114108100'
Tidbit answered 9/12, 2011 at 23:30 Comment(0)
N
3

It is not at all obvious why one would want to concatenate the (decimal) "ascii values". What is certain is that concatenating them without leading zeroes (or some other padding or a delimiter) is useless -- nothing can be reliably recovered from such an output.

>>> tests = ["hi", "Hi", "HI", '\x0A\x29\x00\x05']
>>> ["".join("%d" % ord(c) for c in s) for s in tests]
['104105', '72105', '7273', '104105']

Note that the first 3 outputs are of different length. Note that the fourth result is the same as the first.

>>> ["".join("%03d" % ord(c) for c in s) for s in tests]
['104105', '072105', '072073', '010041000005']
>>> [" ".join("%d" % ord(c) for c in s) for s in tests]
['104 105', '72 105', '72 73', '10 41 0 5']
>>> ["".join("%02x" % ord(c) for c in s) for s in tests]
['6869', '4869', '4849', '0a290005']
>>>

Note no such problems.

Nunes answered 10/12, 2011 at 0:37 Comment(0)
G
3

your description is rather confusing; directly concatenating the decimal values doesn't seem useful in most contexts. the following code will cast each letter to an 8-bit character, and THEN concatenate. this is how standard ASCII encoding works

def ASCII(s):
    x = 0
    for i in xrange(len(s)):
        x += ord(s[i])*2**(8 * (len(s) - i - 1))
    return x
Gyrus answered 28/3, 2017 at 5:5 Comment(0)
S
2
def stringToNumbers(ord(message)):
    return stringToNumbers
    stringToNumbers.append = (ord[0])
    stringToNumbers = ("morocco")
Sulfonate answered 21/10, 2014 at 8:24 Comment(0)
B
1

you can actually do it with numpy:

import numpy as np
a = np.fromstring('hi', dtype=np.uint8)
print(a)
Bassorilievo answered 24/1, 2020 at 13:22 Comment(1)
Note fromstring is now deprecated, so something like np.frombuffer(b'hi', dtype=np.uint8) would be preferred.Redwing
P
0

If you don't mind the numpy dependency, you can also do it by simply casting the string as a 1D numpy ndarray and view it as int32 dtype.

import numpy as np

text = "hi"
np.array([text]).view('int32').tolist()   # [104, 105]

Note that similar to the built-in ord() function, the above operation returns the unicode code points of characters (only much faster if the string is very long) whereas .encode() encodes a string literal into a bytes literal which permits only ASCII characters which is not a problem for the scope of this current question but if you have a non-ASCII character such as Japanese, Russian etc. you may not get what you expected.

For example:

s = "Меси"
list(map(ord, s))                     # [1052, 1077, 1089, 1080]
np.array([s]).view('int32').tolist()  # [1052, 1077, 1089, 1080]
list(s.encode())                      # [208, 156, 208, 181, 209, 129, 208, 184]
Proser answered 2/11, 2023 at 20:33 Comment(2)
for that to work on my system, it needs to be np.array([text]).view(dtype=np.int32).tolist() and np.array([s]).view(dtype=int32).tolist() respectively.Retrospective
@Retrospective thanks for pointing me to that issue. I edited the post accordingly. ThanksProser

© 2022 - 2024 — McMap. All rights reserved.