hexdump output order

Asked 17/5, 2010 at 7:47 Answered 16/11, 2016 at 22:7

I am playing with the Unix hexdump utility. My input file is UTF-8 encoded, containing a single character ñ, which is C3 B1 in hexadecimal UTF-8.

hexdump test.txt
0000000 b1c3
0000002

Huh? This shows B1 C3 - the inverse of what I expected! Can someone explain?

For getting the expected output I do:

hexdump -C test.txt
00000000  c3 b1                                             |..|
00000002

I was thinking I understood encoding systems.

Helminthiasis answered 17/5, 2010 at 7:47 Comment(2)

en.wikipedia.org/wiki/Endianness – Wachter 17/5, 2010 at 7:56

This seems to explain why xxd and hexdump show different results! – Yeoman 18/12, 2020 at 12:27

This is because hexdump defaults to using 16-bit words and you are running on a little-endian architecture. The byte sequence b1 c3 is thus interpreted as the hex word c3b1. The -C option forces hexdump to work with bytes instead of words.

Continuous answered 17/5, 2010 at 8:7 Comment(6)

I was thinking it must have something to do with endianness. – Helminthiasis 17/5, 2010 at 8:18

but why hexdump default to this confusing output format? is there any historic reason? – Inter 1/3, 2012 at 12:5

What's confusing is the propensity for humans to encode numbers in big-endian order. Little-endian is more logical, which is why it's used on many CPU architectures, including x86, in spite of the awkwardness. – Continuous 2/3, 2012 at 2:32

Actually big-endian and little-endian each have their strengths and weaknesses. Neither is "more logical" in an absolute sense. – Flann 15/4, 2016 at 6:40

@MarceloCantos, what's confusing is that it assumes 16bit words little endian. What is the logic in choosing 16bit words? Or any other word length? IMO makes more sense to default to big endian representation which would look the same regardless of word length thus much less confusing in this use case. – Sore 29/12, 2016 at 7:51

Purely conjecture, but the historic reason is almost certainly that hexdump was initially implemented on a little endian machine that used 16 bit words and it was a perfectly reasonable default. – Act 1/6, 2017 at 13:20

I found two ways to avoid that:

hexdump -C file

od -tx1 < file

I think it is stupid that hexdump decided that files are usually 16bit word little endian. Very confusing IMO.

Sore answered 16/11, 2016 at 22:7 Comment(2)

While hexdump defaults to using 16-bit words, I think the endianness depends on the architecture it's running on – Presentment 12/11, 2023 at 17:45

@erwaman, true. Tried with podman run --rm -ti --arch s390x --entrypoint /bin/sh quay.io/centos/centos:stream9 and then install util-linux to get hexdump. And without flags it showed big endian format. – Sore 13/11, 2023 at 21:7

Recommended topics

Hot tags