Strange Python memory allocation
Asked Answered
L

2

1

While trying to figure out how Python's garbage collection system works, I stumbled across this oddity. Running this simple code:

import numpy as np
from memory_profiler import profile

@profile
def my_func():
    a = np.random.rand(1000000)
    a = np.append(a, [1])
    a = np.append(a, [2])
    a = np.append(a, [3])
    a = np.append(a, [4])
    a = np.append(a, [5])
    b = np.append(a, [6])
    c = np.append(a, [7])
    d = np.append(a, a)

    return a

if __name__ == '__main__':
    my_func()

using memory_profiler version 0.52 and Python 3.7.6 on my MacBook, I got the following output:

Line #    Mem usage    Increment   Line Contents
================================================
     4     54.2 MiB     54.2 MiB   @profile
     5                             def my_func():
     6     61.8 MiB      7.7 MiB       a = np.random.rand(1000000)
     7     69.4 MiB      7.6 MiB       a = np.append(a, [1])
     8     69.4 MiB      0.0 MiB       a = np.append(a, [2])
     9     69.4 MiB      0.0 MiB       a = np.append(a, [3])
    10     69.4 MiB      0.0 MiB       a = np.append(a, [4])
    11     69.4 MiB      0.0 MiB       a = np.append(a, [5])
    12     69.4 MiB      0.0 MiB       b = np.append(a, [6])
    13     77.1 MiB      7.6 MiB       c = np.append(a, [7])
    14     92.3 MiB     15.3 MiB       d = np.append(a, a)
    15                             
    16     92.3 MiB      0.0 MiB       return a

Two things are odd. First, why is line 7 giving any more noticeable increase in memory than lines 8-11? Second, why isn't line 12 giving the same increase in memory as line 13?

Note that if I delete lines 12-14, I still get the increase in memory in line 7. So it's not a bug where the memory is actually being increased in line 12 but memory_profiler is incorrectly showing that increase in line 7.

Landed answered 23/7, 2020 at 23:48 Comment(0)
L
1

creating a makes an array with 8e6 bytes (check `a.nybtes)

 6     61.8 MiB      7.7 MiB       a = np.random.rand(1000000)

np.append makes a new array (it is concatenate, not list append), so we get another 8MB increase.

 7     69.4 MiB      7.6 MiB       a = np.append(a, [1])

my guess is that in following steps it cycles back and forth using those two 8MB blocks. numpy doesn't return (to the OS) every free block.

Then you assign the new array to c. a still exists, along with b. (I missed b the first time I looked at this.)

13     77.1 MiB      7.6 MiB       c = np.append(a, [7])

d is twice the size of a, so that accounts for the 15MB jump. a,b,c still exist.

14     92.3 MiB     15.3 MiB       d = np.append(a, a)

b and c are just one or two numbers bigger than a - so each takes up about 8MB. That seems to account for everything!

When tracking memory use, keep in mind that numpy, python and the OS all play role. Most of us don't know all the details, so we can only make rough guesses as to what's happening.

Longcloth answered 24/7, 2020 at 0:45 Comment(0)
E
0

In the line 7 could this be some form of overhead from profiler? From what I can see using sys.getsizeof() the array is incremented by 8 bytes at each append with no sudden jumps.

At first I thought that this could be a similar situation as with Python lists, where memory gets allocated once every 4 appends in 32-byte chunks, but this doesn't seem to be the case.

Without a function or profiler I can see no behaviour similar to what you showed in your post. The only oddity I can see is that d is not exactly double the size of a.

import numpy as np
import sys

a = np.random.rand(1000000)

sys.getsizeof(a)
Out[55]: 8000096

a = np.append(a, [1])

sys.getsizeof(a)
Out[57]: 8000104

a = np.append(a, [2])

sys.getsizeof(a)
Out[59]: 8000112

a = np.append(a, [3])

sys.getsizeof(a)
Out[61]: 8000120

a = np.append(a, [4])

sys.getsizeof(a)
Out[63]: 8000128

a = np.append(a, [5])

sys.getsizeof(a)
Out[66]: 8000136

a = np.append(a, [6])

sys.getsizeof(a)
Out[68]: 8000144

a = np.append(a, [7])

sys.getsizeof(a)
Out[71]: 8000152

d = np.append(a, a)

sys.getsizeof(d)
Out[73]: 16000208
Externalization answered 24/7, 2020 at 0:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.