hexdump output order
Asked Answered
H

2

49

I am playing with the Unix hexdump utility. My input file is UTF-8 encoded, containing a single character ñ, which is C3 B1 in hexadecimal UTF-8.

hexdump test.txt
0000000 b1c3
0000002

Huh? This shows B1 C3 - the inverse of what I expected! Can someone explain?

For getting the expected output I do:

hexdump -C test.txt
00000000  c3 b1                                             |..|
00000002

I was thinking I understood encoding systems.

Helminthiasis answered 17/5, 2010 at 7:47 Comment(2)
en.wikipedia.org/wiki/EndiannessWachter
This seems to explain why xxd and hexdump show different results!Yeoman
C
62

This is because hexdump defaults to using 16-bit words and you are running on a little-endian architecture. The byte sequence b1 c3 is thus interpreted as the hex word c3b1. The -C option forces hexdump to work with bytes instead of words.

Continuous answered 17/5, 2010 at 8:7 Comment(6)
I was thinking it must have something to do with endianness.Helminthiasis
but why hexdump default to this confusing output format? is there any historic reason?Inter
What's confusing is the propensity for humans to encode numbers in big-endian order. Little-endian is more logical, which is why it's used on many CPU architectures, including x86, in spite of the awkwardness.Continuous
Actually big-endian and little-endian each have their strengths and weaknesses. Neither is "more logical" in an absolute sense.Flann
@MarceloCantos, what's confusing is that it assumes 16bit words little endian. What is the logic in choosing 16bit words? Or any other word length? IMO makes more sense to default to big endian representation which would look the same regardless of word length thus much less confusing in this use case.Sore
Purely conjecture, but the historic reason is almost certainly that hexdump was initially implemented on a little endian machine that used 16 bit words and it was a perfectly reasonable default.Act
S
6

I found two ways to avoid that:

hexdump -C file

or

od -tx1 < file

I think it is stupid that hexdump decided that files are usually 16bit word little endian. Very confusing IMO.

Sore answered 16/11, 2016 at 22:7 Comment(2)
While hexdump defaults to using 16-bit words, I think the endianness depends on the architecture it's running onPresentment
@erwaman, true. Tried with podman run --rm -ti --arch s390x --entrypoint /bin/sh quay.io/centos/centos:stream9 and then install util-linux to get hexdump. And without flags it showed big endian format.Sore

© 2022 - 2024 — McMap. All rights reserved.