How do I convert a string of hexadecimal values to a list of integers?
Asked Answered
C

3

12

I have a long string of hexadecimal values that all looks similar to this:

'\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'

The actual string is 1024 frames of a waveform. I want to convert these hexadecimal values to a list of integer values, such as:

[0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0]

How do I convert these hex values to ints?

Creon answered 19/2, 2013 at 15:50 Comment(1)
You have a byte string, which python, when printing, converts to a string literal representation for you. The \x00 escapes are used for any byte that is not a printable ASCII character.Institute
I
8

You can use ord() in combination with map():

>>> s = '\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'
>>> map(ord, s)
[0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0]
Iceskate answered 19/2, 2013 at 15:53 Comment(17)
Not the best way of doing it, not with struct.unpack() available and capable of interpreting bytes as other types too.Institute
@MartijnPieters -- But a clever way to do it for this very limited problem ... It made me smile.Delft
This solution is 6 times slower than struct.unpack, btw.. struct takes 0.3 seconds for a million iterations, while map(ord, s) needs 1.8 seconds.Institute
@MartijnPieters If we were looking for high performance in waveform processing, we would probably choose something other than Python anyway.Iceskate
@cdhowie: Perhaps. Still, no need to use something that is so much slower at the same operation. :-)Institute
@MartijnPieters Perhaps. If the input is only going to be 1,024 bytes each time then the difference is likely to be negligible. My only points against struct.unpack() are (1) an extra import, and (2) you have to do some string formatting with len() to get the length of the string into the format specifier, which strikes me as a bit unwieldy. Of course you can hide that behind a function, but I prefer to take the clean solution and optimize later, after profiling.Iceskate
@cdhowie: with 1024 bytes each time map() is 10 times slower, actually. array.array() becomes the fastest option, in that case, beating out struct.unpack() by about 20%.Institute
@MartijnPieters Relatively, sure.. but how much CPU time would it take relative to everything else happening in the script? Again, code and then optimize.Iceskate
@cdhowie: But knowing beforehand what will be faster wins you half the battle. Stack Overflow gives you the opportunity to be aware of the options you have for a given operation; by adding timing information to the answers here you can make a more informed choice without having to go and optimize this yourself should the need for optimization arise.Institute
It's great to know faster ways to solve problems. In this case, however, my code doesn't need to be quick. I would argue that more people know about map and ord than struct.unpack, which makes my code more readable to the average programmer.Creon
I have a question: How do you achieve the opposite the opposite? That is fom a list [0, 1, 2, ...] get a string "\x00\x01\x02..." ? Is there a function for this?Fission
@Fission str.join('', (chr(i) for i in your_list))Iceskate
Thanks, but this gives garbage, I mean the string contains control chars, etc. BTW, this can be coded more simply as "".join((chr(i) for i in lst)). Now, what I asked was if there is a function for converting a list to a string like "\x01\x02\x03...". I can do this of course "manually" with "".join("\\x%02x" % (i) for i in lst), but this is a whole operation (even in compact form), not a function.Fission
@Fission That's not the same string. What do you think \x01 means in a string literal?Iceskate
Why don't you JUST try it? print "\x01" prints a happy face! (Not mine at this moment :)) To obtain "x\01" as such, you have to use print r'"\x01".Fission
@Fission Correct, but that's not what OP means. By \x01 the OP means the single control character, not the sequence of four characters \x01. If you want the actual escape notation (for some strange reason), repr() is a thing.Iceskate
... Lost in translation - OK, Let's not waste the space of this thread anymore.Fission
D
9

use struct.unpack:

>>> import struct
>>> s = '\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'
>>> struct.unpack('11B',s)
(0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0)

This gives you a tuple instead of a list, but I trust you can convert it if you need to.

Delft answered 19/2, 2013 at 15:51 Comment(0)
I
8

You can use ord() in combination with map():

>>> s = '\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'
>>> map(ord, s)
[0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0]
Iceskate answered 19/2, 2013 at 15:53 Comment(17)
Not the best way of doing it, not with struct.unpack() available and capable of interpreting bytes as other types too.Institute
@MartijnPieters -- But a clever way to do it for this very limited problem ... It made me smile.Delft
This solution is 6 times slower than struct.unpack, btw.. struct takes 0.3 seconds for a million iterations, while map(ord, s) needs 1.8 seconds.Institute
@MartijnPieters If we were looking for high performance in waveform processing, we would probably choose something other than Python anyway.Iceskate
@cdhowie: Perhaps. Still, no need to use something that is so much slower at the same operation. :-)Institute
@MartijnPieters Perhaps. If the input is only going to be 1,024 bytes each time then the difference is likely to be negligible. My only points against struct.unpack() are (1) an extra import, and (2) you have to do some string formatting with len() to get the length of the string into the format specifier, which strikes me as a bit unwieldy. Of course you can hide that behind a function, but I prefer to take the clean solution and optimize later, after profiling.Iceskate
@cdhowie: with 1024 bytes each time map() is 10 times slower, actually. array.array() becomes the fastest option, in that case, beating out struct.unpack() by about 20%.Institute
@MartijnPieters Relatively, sure.. but how much CPU time would it take relative to everything else happening in the script? Again, code and then optimize.Iceskate
@cdhowie: But knowing beforehand what will be faster wins you half the battle. Stack Overflow gives you the opportunity to be aware of the options you have for a given operation; by adding timing information to the answers here you can make a more informed choice without having to go and optimize this yourself should the need for optimization arise.Institute
It's great to know faster ways to solve problems. In this case, however, my code doesn't need to be quick. I would argue that more people know about map and ord than struct.unpack, which makes my code more readable to the average programmer.Creon
I have a question: How do you achieve the opposite the opposite? That is fom a list [0, 1, 2, ...] get a string "\x00\x01\x02..." ? Is there a function for this?Fission
@Fission str.join('', (chr(i) for i in your_list))Iceskate
Thanks, but this gives garbage, I mean the string contains control chars, etc. BTW, this can be coded more simply as "".join((chr(i) for i in lst)). Now, what I asked was if there is a function for converting a list to a string like "\x01\x02\x03...". I can do this of course "manually" with "".join("\\x%02x" % (i) for i in lst), but this is a whole operation (even in compact form), not a function.Fission
@Fission That's not the same string. What do you think \x01 means in a string literal?Iceskate
Why don't you JUST try it? print "\x01" prints a happy face! (Not mine at this moment :)) To obtain "x\01" as such, you have to use print r'"\x01".Fission
@Fission Correct, but that's not what OP means. By \x01 the OP means the single control character, not the sequence of four characters \x01. If you want the actual escape notation (for some strange reason), repr() is a thing.Iceskate
... Lost in translation - OK, Let's not waste the space of this thread anymore.Fission
H
2
In [11]: a
Out[11]: '\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00'

In [12]: import array

In [13]: array.array('B', a)
Out[13]: array('B', [0, 0, 0, 1, 0, 0, 0, 255, 255, 0, 0])

Some timings;

$ python -m timeit -s 'text = "\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00";' ' map(ord, text)'
1000000 loops, best of 3: 0.775 usec per loop

$ python -m timeit -s 'import array;text = "\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00"' 'array.array("B", text)'
1000000 loops, best of 3: 0.29 usec per loop

$ python -m timeit -s 'import struct; text = "\x00\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00"'  'struct.unpack("11B",text)'
10000000 loops, best of 3: 0.165 usec per loop
Humfrid answered 19/2, 2013 at 15:54 Comment(2)
Not bad; 0.665 seconds for a million iterations. struct is still faster, but you can manipulate an array and get a byte representation back with fewer steps.Institute
@MartijnPieters - praise from the master! Guido can't be wrong optimization anectodeHumfrid

© 2022 - 2024 — McMap. All rights reserved.