Ways to free memory back to OS from Python?
Asked Answered
L

1

5

I have code that looks similar to this:

def memoryIntensiveFunction(x):
    largeTempVariable = Intermediate(x)
    processFunction(largeTempVariable,x)

The problem is that the variable temp is something like 500 mb in a test case of mine, but that space is not returned to the OS when memoryIntensiveFunction is finished. I know this because memory profiling with the guppy tool says largeTempVariable is freed (i.e., within Python), but psutil shows it isn't. I presume I'm seeing the effects described here. The problem is that this process is long running (i.e. hours), memoryIntensiveFunction is run at the beginning and never again, so it's inconvenient for me to have to carry the 500mb around for hours.

One solution I found here and here suggests using a separate process. Multiprocessing incurs its own costs, but it would be worth it in my case. However, this would require refactoring memoryIntensiveFunction callers to receive x as a return value instead of seeing it modified in place. The real killer is that my object x is not picklable (it makes heavy use of boost python extensions). It would be a lot of work to make x picklable.

Are there any options I'm not considering?

Lamee answered 3/7, 2014 at 23:40 Comment(0)
S
2

This seem curious enough that I tried to reproduce your issue, and seems that simple "del" was plenty. To demonstrate, you can run the following code:

import itertools
import pdb

def test():
    a = "a"
    for _ in itertools.repeat(None, 30):
        a += a
    pdb.set_trace()
    del a
    pdb.set_trace()

test()

And at first breakpoint you will see that it uses roughly 1gb of ram (you want the python3.3 entry):

 Private  +   Shared  =  RAM used       Program

  4.0 KiB +   9.0 KiB =  13.0 KiB       VisualGDB-DisownTTY-r1
  4.0 KiB +  15.0 KiB =  19.0 KiB       sharing-tests
  4.0 KiB +  19.5 KiB =  23.5 KiB       dhcpcd
  4.0 KiB +  31.5 KiB =  35.5 KiB       gdb
  4.0 KiB +  36.0 KiB =  40.0 KiB       vim [deleted]
  4.0 KiB +  38.0 KiB =  42.0 KiB       systemd-udevd
 40.0 KiB +  10.0 KiB =  50.0 KiB       init
 24.0 KiB + 135.0 KiB = 159.0 KiB       agetty (6)
 12.0 KiB + 150.0 KiB = 162.0 KiB       su (3)
 88.0 KiB + 103.0 KiB = 191.0 KiB       syslog-ng (2)
152.0 KiB +  55.0 KiB = 207.0 KiB       crond
172.0 KiB +  81.0 KiB = 253.0 KiB       python3.4
580.0 KiB + 220.5 KiB = 800.5 KiB       sshd (3)
768.0 KiB + 932.0 KiB =   1.7 MiB       bash (13)
  2.8 MiB + 118.0 KiB =   2.9 MiB       mongod
  7.4 MiB + 109.0 KiB =   7.5 MiB       tmux [deleted] (2)
  1.0 GiB +   1.2 MiB =   1.0 GiB       python3.3
---------------------------------
                          1.0 GiB
=================================

And then at second breakpoint, after we del the variable the memory is freed:

 Private  +   Shared  =  RAM used       Program

  4.0 KiB +   9.0 KiB =  13.0 KiB       VisualGDB-DisownTTY-r1
  4.0 KiB +  15.0 KiB =  19.0 KiB       sharing-tests
  4.0 KiB +  19.5 KiB =  23.5 KiB       dhcpcd
  4.0 KiB +  31.5 KiB =  35.5 KiB       gdb
  4.0 KiB +  36.0 KiB =  40.0 KiB       vim [deleted]
  4.0 KiB +  38.0 KiB =  42.0 KiB       systemd-udevd
 40.0 KiB +  10.0 KiB =  50.0 KiB       init
 24.0 KiB + 135.0 KiB = 159.0 KiB       agetty (6)
 12.0 KiB + 150.0 KiB = 162.0 KiB       su (3)
 88.0 KiB + 103.0 KiB = 191.0 KiB       syslog-ng (2)
152.0 KiB +  55.0 KiB = 207.0 KiB       crond
172.0 KiB +  81.0 KiB = 253.0 KiB       python3.4
584.0 KiB + 220.5 KiB = 804.5 KiB       sshd (3)
768.0 KiB + 928.0 KiB =   1.7 MiB       bash (13)
  2.8 MiB + 118.0 KiB =   2.9 MiB       mongod
  5.1 MiB +   1.2 MiB =   6.3 MiB       python3.3
  7.4 MiB + 109.0 KiB =   7.5 MiB       tmux [deleted] (2)
---------------------------------
                         20.3 MiB
=================================

Now if we drop the "del" from function, and set a breakpoint right after test():

import itertools
import pdb

def test():
    a = "a"
    for _ in itertools.repeat(None, 30):
        a += a
    pdb.set_trace()

test()
pdb.set_trace()

The memory indeed won't be freed before we terminate:

 Private  +   Shared  =  RAM used       Program

  4.0 KiB +   9.0 KiB =  13.0 KiB       VisualGDB-DisownTTY-r1
  4.0 KiB +  15.0 KiB =  19.0 KiB       sharing-tests
  4.0 KiB +  19.5 KiB =  23.5 KiB       dhcpcd
  4.0 KiB +  31.5 KiB =  35.5 KiB       gdb
  4.0 KiB +  36.0 KiB =  40.0 KiB       vim [deleted]
  4.0 KiB +  38.0 KiB =  42.0 KiB       systemd-udevd
 40.0 KiB +  10.0 KiB =  50.0 KiB       init
 24.0 KiB + 135.0 KiB = 159.0 KiB       agetty (6)
 12.0 KiB + 150.0 KiB = 162.0 KiB       su (3)
160.0 KiB +  53.0 KiB = 213.0 KiB       crond
172.0 KiB +  81.0 KiB = 253.0 KiB       python3.4
628.0 KiB + 219.5 KiB = 847.5 KiB       sshd (3)
836.0 KiB + 152.0 KiB = 988.0 KiB       syslog-ng (2)
752.0 KiB + 957.0 KiB =   1.7 MiB       bash (13)
  2.8 MiB + 113.0 KiB =   2.9 MiB       mongod
  7.4 MiB + 108.0 KiB =   7.6 MiB       tmux [deleted] (2)
  1.0 GiB +   1.1 MiB =   1.0 GiB       python3.3
---------------------------------
                          1.0 GiB
=================================

So my suggestion? Just delete the sucker after you've used it, and do not need it any more ;)

Stopwatch answered 4/7, 2014 at 0:6 Comment(5)
Note, I'm using Python 2.7, so perhaps there are memory optimization improvements in Python 3. Also, I was able to get a simple test case involving creating [1.0]*10**7 and del was able to free it as observed using psutil, but my object largeTempVariable is somehow non-trivial in its data structure that going out of scope isn't freeing it back to the OS.Lamee
@Lamee so what exactly is it? In OP you say that it's a large variable - and this is what I've used for the test, and It works the same way with 2.7. But now you seem to indicate that it isn't a variable but a list? So what is it exactly?Stopwatch
It's an object containing dicts, sets, and nested dicts and sets, etc. At the bottom it's ints and strings...Lamee
@Lamee that is still being quite vague, way too vague to actually be able to pinpoint what is causing the problem with memory not freeing. With that in mind I cannot help you in any other way than saying - either redesign the class to make it simpler, or rework your application to be more atomic, so you can break it down into separate threads.Stopwatch
This may be OS-specific. I believe Linux is especially prone to not reclaiming unused memory under the assumption that unused memory is only "used" on paper, that is, since it's not "hot" memory it will be swapped that's an adequate solution itself.Equity

© 2022 - 2024 — McMap. All rights reserved.