Why do Python booleans take up more than a byte?
Asked Answered
A

1

8

Apparently integers cost 24 bytes in Python. I can understand that it does so because of the extra bells and whistles of representing unbounded number. However it looks like boolean data types also cost whopping 24 bytes even though it can only ever represent one of two possible values. Why? What extra data might need to be stored aside from 1 bit representing True or False?

Note: I'm not asking for best way to store bools. I'm already aware of NumPy, BitArray, etc. from other answers. My question is why, not how.

Ardeliaardelis answered 29/12, 2015 at 0:26 Comment(7)
1. Because booleans are integer subclasses. 2. No, if you care about that Python may not be low-level enough for you (or you need to use something like numpy with it).Subjoinder
use ctypes I guess ... but really if you are worried about this python might not be the right choice of language for this particular project ...Peripheral
"I can understand that it does so because of extra bells and whistles of representing unbounded number" - nope. It's stuff like the type pointer and the reference count. On Python 2, long, the actual bignum type, has even bigger instances.Gryphon
I don't think "deal with it" is right answer. I assume Python designers are very smart people and they must have some reason to let boolean occupy massive 24 bytes in memory. This design not only wastes memory but also slows down operations with this type. There has to be good reason.Ardeliaardelis
True and False are singletons, so it's not wasting very much memory!Subjoinder
@Subjoinder - I was suspecting it might be subclass of integer but still it doesn't make sense. Interpreter can always dynamically cast boolean to int when needed. There is no reason to waste so much memory right of the bat.Ardeliaardelis
Also relevant: https://mcmap.net/q/216326/-why-is-bool-a-subclass-of-int/3001761, which answers what you've changed your question to from a historical perspective.Subjoinder
G
6

A bool may be pretty huge for what it represents, but there are only two of them. A list full of Trues only contains 4- or 8-byte references to the one canonical True object.

If 8 bytes is still too big, and you really want to use Python for whatever it is you're doing, you could consider using an array type like that provided by the built-in array module or NumPy. These offer 1-byte-per-bool representations. If this is still too much, you could use a bitset, either manually with Python's built-in bignums or with something like BitVector from PyPI. These options are likely to slow your program way down. Some of them can offer speed improvements, but only if you take advantage of features that let you push work out of interpreted code and into C.

Gryphon answered 29/12, 2015 at 0:36 Comment(5)
I know the alternatives of using NumPy, BitVector etc. I'm trying to figure out the internal reasoning that Python designers used to let boolean occupy 24 bytes. Most of the things in Python is very well thought out but this...Ardeliaardelis
@ShitalShah: If your program is so pressed for space that you need to worry about the 48 bytes consumed by True and False, you're probably not working on a machine with an operating system, let alone a Python interpreter.Gryphon
@ShitalShah again, if you're worried about this sort of thing, stop using Python! There are many lower-level languages if you want to do this kind of micro-optimisation; that's not a design goal for Python.Subjoinder
@johnsharpe - I'm not "worried" about this. The question is simply trying to understand pro and cons of language design decision. I've searched enough and I can't find anyone giving insights in to this design decision. This is not a duplicate question.Ardeliaardelis
@ShitalShah: As everyone has noted, True and False are singleton objects, so the incremental cost to store an additional boolean value is just the pointer to reference them (4 or 8 bytes). The advantage to the overhead is consistent internal representation of Python objects. They all share a common header, and that common header includes some fixed overhead; a pointer to the type object (4-8 bytes), a reference count (4-8 bytes) and in the case of bool, the actual value, which with structure padding adds another 4-8 bytes. The common structure means they can be handled uniformly in C.Gardy

© 2022 - 2025 — McMap. All rights reserved.