Switching endianness in the middle of a struct.unpack format string
Asked Answered
P

1

7

I have a bunch of binary data (the contents of a video game save-file, as it happens) where a part of the data contains both little-endian and big-endian integer values. Naively, without reading much of the docs, I tried to unpack it this way...

struct.unpack(
    '3sB<H<H<H<H4s<I<I32s>IbBbBbBbB12s20sBB4s',
    string_data
)

...and of course I got this cryptic error message:

struct.error: bad char in struct format

The problem is that struct.unpack format strings do not expect individual fields to be marked with endianness. The actually correct format-string here would be something like

struct.unpack(
    '<3sBHHHH4sII32sIbBbBbBbB12s20sBB4s',
    string_data
)

except that this will flip the endianness of the third I field (parsing it as little-endian, when I really want to parse it as big-endian).

Is there an easy and/or "Pythonic" solution to my problem? I have already thought of three possible solutions, but none of them is particularly elegant. In the absence of better ideas I'll probably go with number 3:

  1. I could extract a substring and parse it separately:

    (my.f1, my.f2, ...) = struct.unpack('<3sBHHHH4sII32sIbBbBbBbB12s20sBB4s', string_data)
    my.f11 = struct.unpack('>I', string_data[56:60])
    
  2. I could flip the bits in the field after the fact:

    (my.f1, my.f2, ...) = struct.unpack('<3sBHHHH4sII32sIbBbBbBbB12s20sBB4s', string_data)
    my.f11 = swap32(my.f11)
    
  3. I could just change my downstream code to expect this field to be represented differently — it's actually a bitmask, not an arithmetic integer, so it wouldn't be too hard to flip around all the bitmasks I'm using with it; but the big-endian versions of these bitmasks are more mnemonically relevant than the little-endian versions.

Pd answered 17/2, 2018 at 19:51 Comment(3)
I think that there's something conceptually wrong here. There should be no endiansess mix. The fix would impact the source of the string that you need to unpack. Regarding the downstream code option. That deals with an int (already converted) which automatically uses the endianness of the machine that it runs on.Quadricycle
@CristiFati: The string I'm unpacking comes from a save-game file format. I don't control the details of how it's encoded; I can't change them. All I can do is try to deal with the encoding I'm given, and the encoding I'm given does mix endiannesses in this exact way.Pd
As a more wide-spread example, the ISO 9660 file system encodes integers as both little endian and big endian in some places. Often it's so that you can pick the easier format to work with on your architecture, but if checking the integrity of the data, it might be useful to decode both and check that they are equal.Marchal
M
0

A little late to the party, but I just had the same problem. I solved it with a custom numpy dtype, which allows to mix elements with different endianess (see https://numpy.org/doc/stable/reference/generated/numpy.dtype.html):

t=np.dtype('>u4,<u4') # Compound type with two 4-byte unsigned int with different byte order
a=np.zeros(shape=1, dtype=t) # Create an array of length one with above type
a[0][0]=1 # Assign first uint
a[0][1]=1 # Assign second uint
bytes=a.tobytes() # bytes should be b'\x01\x00\x00\x00\x00\x00\x00\x01'
b=np.frombuffer(buf, dtype=t) # should yield array[(1,1)]
c=np.frombuffer(buf, dtype=np.uint32) # yields array([       1, 16777216]
Monomerous answered 3/6, 2020 at 8:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.