Why does mixing types in Python struct.pack uses more space than needed?
Asked Answered
A

4

8

I have just tried using struct.pack in Python for the first time, and I don't understand its behaviour when I am mixing types

When I am trying to pack a single char and nothing else, it works as expected, i.e.

struct.pack("b",1)

gives '\x01'. But as soon as I try to mix in data of a different type, the char is padded to be as long as this type, e.g.

struct.pack("bi",1,1)

gives '\x01\x00\x00\x00\x01\x00\x00\x00'.

Is this standard behaviour, and why? Is there a way around it?

Edit

More simply put:

>>> struct.calcsize("b")
1
>>> struct.calcsize("i")
4
>>> struct.calcsize("bi")
8
Aerospace answered 24/1, 2014 at 12:39 Comment(3)
And surprisingly: struct.calcsize('d') → 8, struct.calcsize('bd') ­­→ 16. Alignment and padding seems to be depending on the other types. That's not intuitive, even if you expect alignment. EDIT: Ah, I see, the size of the 'd' is 8, and that determines where it can start (at multiples of 8), hence the padding.Dora
@Dora Exactly, that’s why struct.calcsize('db') is 9 (note that in C, it would usually be 12 as words need to be filled completely).Petulah
Yes, @poke, nice notion to swap the two elements to display that the padding does not belong to the byte but to the double.Dora
M
8

struct.pack is usually used to access memory structures, not files. In memory, accessing data which occupies several bytes at an odd/unaligned address can cause exceptions or performance loss.

That's why compilers align the data (usually on a 4 or 8 byte boundary) and the struct module in Python does the same.

To disable this, you can use the first character of the format string to set the byte order and alignment. In your case, try struct.pack("=bi",1,1)

If you don't specify anything, then an implicit @ which means "native byte order, size and alignment". See the documentation for other options.

Mohawk answered 24/1, 2014 at 12:48 Comment(0)
V
4

Yes, it is.

By default, C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).

If you don't want alignment, just specify a byte order by starting your format string with '=', '<', or '>' (same as '!').

Vassily answered 24/1, 2014 at 12:46 Comment(0)
P
2

From the manual:

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct.

i is a 4-byte integer which will be placed on its own word. As such, anything next to it, that doesn’t fill a word, will be padded to do that. You can override this behavior by specifying a byte order without native alignment.

That’s why—with more complex structs—the ordering of the things inside matters a lot.

See also the Wikipedia article on the topic.

Petulah answered 24/1, 2014 at 12:47 Comment(0)
C
2

See the documentation for struct; in particular it says

By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct.

And see e.g. this Stack Overflow question for C struct memory layout: C struct memory layout?

In short, the integer is 4 bytes and therefore it must start at a multiple of 4. If you change the order of b and i around, the problem should't arise.

Chinaware answered 24/1, 2014 at 12:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.