Why is sys.getsizeof reporting bigger values for smaller lists?

Short story:

It's about overallocation and avoiding useless overallocation. Your cases have 6 and 7 elements. In both cases, Python first calculates 12 as the number of spots to allocate. The purpose of overallocation is to allow fast future extensions with more elements, so Python tries to guess what will happen in the future and acts accordingly.

For the 6 elements case it thinks "Hmm, in case we shall add another 6 elements, then it would indeed be good to already have 12 spots, so let's do that now."

For the 7 elements case it thinks "Hmm, in case we shall add another 7 elements, then 12 spots wouldn't be enough anyway (for then 14 elements), so we'd have to re-overallocate anyway, so let's just not overallocate now. Maybe there won't even be another extension."

So for 6 elements it allocates 12 spots and for 7 it allocates 8 spots (minimal overallocation to a multiple of 4). That's 4 spots difference. A spot holds a pointer to an object, which with 64-bit Python takes 8 bytes. So 7 elements need 4*8 = 32 fewer bytes than 6 elements, which is what you observed (120 bytes vs 152 bytes).

Long story:

I can reproduce it in CPython 3.10.0. Here's what happens:

>>> import dis
>>> dis.dis('[0,1,2,3,4,5,6]')
  1           0 BUILD_LIST               0
              2 LOAD_CONST               0 ((0, 1, 2, 3, 4, 5, 6))
              4 LIST_EXTEND              1
              6 RETURN_VALUE

An empty list is built and then extended by that tuple. It first resizes to make room for the elements. Which does this to compute how many spots to allocate:

    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
    /* Do not overallocate if the new size is closer to overallocated size
     * than to the old size.
     */
    if (newsize - Py_SIZE(self) > (Py_ssize_t)(new_allocated - newsize))
        new_allocated = ((size_t)newsize + 3) & ~(size_t)3;

Let's test that in Python:

>>> for newsize in range(10):
...     new_allocated = (newsize + (newsize >> 3) + 6) & ~3
...     if newsize - oldsize > new_allocated - newsize:
...         new_allocated = (newsize + 3) & ~3
...     s = f'[{",".join(map(str, range(newsize)))}]'
...     calculated_size = sys.getsizeof([]) + 8 * new_allocated
...     actual_size = sys.getsizeof(eval(s))
...     print(f'{s:20}  {calculated_size=}  {actual_size=}')
... 
[]                    calculated_size=88  actual_size=56
[0]                   calculated_size=88  actual_size=64
[0,1]                 calculated_size=120  actual_size=72
[0,1,2]               calculated_size=120  actual_size=120
[0,1,2,3]             calculated_size=120  actual_size=120
[0,1,2,3,4]           calculated_size=120  actual_size=120
[0,1,2,3,4,5]         calculated_size=152  actual_size=152
[0,1,2,3,4,5,6]       calculated_size=120  actual_size=120
[0,1,2,3,4,5,6,7]     calculated_size=120  actual_size=120
[0,1,2,3,4,5,6,7,8]   calculated_size=152  actual_size=152

Our calculated size matches the actual size. Except for fewer than three elements, but that's because they're not created by extending like that (I'll show that at the end), so it's not surprising our formula doesn't apply there.

Let's look at the code again:

    new_allocated = (newsize + (newsize >> 3) + 6) & ~3
    if newsize - oldsize > new_allocated - newsize:
        new_allocated = (newsize + 3) & ~3

And the values for your cases:

              [0,1,2,3,4,5]    [0,1,2,3,4,5,6]

oldsize             0                 0
newsize             6                 7
new_allocated      12                12
    corrected      12                 8

The reasoning from the code again:

    /* Do not overallocate if the new size is closer to overallocated size
     * than to the old size.

The newsize 7 is closer to 12 than to 0, so it decides not to overallocate (well, it does overallocate to the closest multiple of 4, for memory alignment and because that appears to work well).

The reasoning behind that, as stated by Serhiy Storchaka in the proposal:

It is common enough case if the list is created from a sequence of a known size and do not adds items anymore. Or if it is created by concatenating of few sequences. In such cases the list can overallocate a space which will never be used. [...] My idea, that if we adds several items at a time and need to reallocate an array, we check if the overallocated size is enough for adding the same amount of items next time. If it is not enough, we do not overallocate. [...] It will save a space if extend a list by many items few times.

So the idea is to consider future growths of the same size, and if the next growth would require a new overallocation anyway, the current overallocation wouldn't help, so let's not do it.

About the sizes up to two elements: That doesn't use a tuple and LIST_EXTEND but instead puts the individual values onto the stack and builds the list directly in BUILD_LIST (note its argument 0, 1 or 2):

dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE
dis.dis('[1969]')
                    
  1           0 LOAD_CONST               0 (1969)
              2 BUILD_LIST               1
              4 RETURN_VALUE
dis.dis('[1969, 1956]')
                    
  1           0 LOAD_CONST               0 (1969)
              2 LOAD_CONST               1 (1956)
              4 BUILD_LIST               2
              6 RETURN_VALUE

The code to execute BUILD_LIST builds a new list object with the exact number of spots needed (the number oparg: 0, 1 or 2), no overallocation. And then right there it just uses a quick little loop to pop the values off the stack and put them into the list:

        case TARGET(BUILD_LIST): {
            PyObject *list =  PyList_New(oparg);
            if (list == NULL)
                goto error;
            while (--oparg >= 0) {
                PyObject *item = POP();
                PyList_SET_ITEM(list, oparg, item);
            }
            PUSH(list);
            DISPATCH();
        }

Short story:

Long story:

Recommended topics

Hot tags