Pythonic way to hex dump files

B

3

8

Is there any way to code in a pythonic way this Bash command?

hexdump -e '2/1 "%02x"' file.dat

Obviously, without using os.popen, or any such shortcut ;)

It would be great if the code was functional in Python3.x

Blackfish answered 28/7, 2014 at 22:42 Comment(5)

What does 2/1 "%02x" mean? – Dissyllable 4/12, 2014 at 14:15

"%02x" prints each byte as a 2 character, 0 prefixed capital hex number. – Blackfish 27/1, 2015 at 15:19

And what about 2/1? The question would be much clearer for those who know Python, but now familiar with hexfump cli. – Dissyllable 27/1, 2015 at 16:43

Take a look at that : << 256.com/gray/docs/misc/hexdump_manual_how_to.html >> "An interation count which defaults to 1 if not supplied but has to be supplied if you want a byte count. This tells how many times to do the conversion before we print the end string. So if you were decoding 4 things, each of 1 byte, you'd say 4/1. " – Blackfish 27/1, 2015 at 21:15

I wrote a somewhat prettier hex dump utility (more like xxd than the very raw output in these answers) in https://mcmap.net/q/1321521/-converting-german-characters-like-228-223-etc-from-mac-roman-to-utf-or-similar – Duodenum 13/6 at 8:25

F

14

If you only care about Python 2.x, line.encode('hex') will encode a chunk of binary data into hex. So:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print chunk.encode('hex')

_{(IIRC, hexdump by default prints 32 pairs of hex per line; if not, just change that 32 to 16 or whatever it is…)}

_{If the two-argument iter looks baffling, click the help link; it's not too complicated once you get the idea.}

If you care about Python 3.x, encode only works for codecs that convert Unicode strings to bytes; any codecs that convert the other way around (or any other combination), you have to use codecs.encode to do it explicitly:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print(codecs.encode(chunk, 'hex'))

Or it may be better to use hexlify:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print(binascii.hexlify(chunk))

If you want to do something besides print them out, rather than read the whole file into memory, you probably want to make an iterator. You could just put this in a function and change that print to a yield, and that function returns exactly the iterator you want. Or use a genexpr or map call:

with open('file.dat', 'rb') as f:
    chunks = iter(lambda: f.read(32), b'')
    hexlines = map(binascii.hexlify, chunks)

Filipino answered 28/7, 2014 at 23:25 Comment(1)

None of the answers touch on how to implement the second part, I.e. the -e '2/1 "%02x – Redmond 22/9, 2020 at 12:56

T

16

The standard library is your friend. Try binascii.hexlify().

Ticonderoga answered 28/7, 2014 at 22:58 Comment(0)

F

14

If you only care about Python 2.x, line.encode('hex') will encode a chunk of binary data into hex. So:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print chunk.encode('hex')

_{(IIRC, hexdump by default prints 32 pairs of hex per line; if not, just change that 32 to 16 or whatever it is…)}

_{If the two-argument iter looks baffling, click the help link; it's not too complicated once you get the idea.}

If you care about Python 3.x, encode only works for codecs that convert Unicode strings to bytes; any codecs that convert the other way around (or any other combination), you have to use codecs.encode to do it explicitly:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print(codecs.encode(chunk, 'hex'))

Or it may be better to use hexlify:

with open('file.dat', 'rb') as f:
    for chunk in iter(lambda: f.read(32), b''):
        print(binascii.hexlify(chunk))

If you want to do something besides print them out, rather than read the whole file into memory, you probably want to make an iterator. You could just put this in a function and change that print to a yield, and that function returns exactly the iterator you want. Or use a genexpr or map call:

with open('file.dat', 'rb') as f:
    chunks = iter(lambda: f.read(32), b'')
    hexlines = map(binascii.hexlify, chunks)

Filipino answered 28/7, 2014 at 23:25 Comment(1)

None of the answers touch on how to implement the second part, I.e. the -e '2/1 "%02x – Redmond 22/9, 2020 at 12:56

G

5

Simply read() the whole file and encode('hex'). What could be more pythonic?

with open('file.dat', 'rb') as f:
    hex_content = f.read().encode('hex')

Gegenschein answered 28/7, 2014 at 22:57 Comment(5)

Except you almost certainly want to open it in rb mode so it doesn't translate newlines. Also, this is Python 2-specific; in Python 3, you can't encode bytes. Still +1. – Filipino 28/7, 2014 at 23:18

Great approach, it works, but only in python2. This is the output in Py3.4: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xac in position 0: invalid start byte – Blackfish 28/7, 2014 at 23:20

@peluzza: Do you need Python 3? – Filipino 28/7, 2014 at 23:20

Well, I'm doing my best to code only for 3.X, but the gaps are so deep, not only working with hex dumps ;) – Blackfish 28/7, 2014 at 23:23

@peluzza: See Raymond Hettinger's answer, or mine if you need more details. – Filipino 28/7, 2014 at 23:25

Recommended topics

Hot tags