How to implement a memory intensive python script for test

Asked 28/5, 2015 at 14:21 Answered 29/5, 2015 at 9:40

I've applied a cgroups rule to a specific user, and I'd like to test whether memory of the programs running from the above user has been limited as expected. I tried with the following script:

import string
import random

if __name__ == '__main__':
    d = {}
    i = 0;
    for i in range(0, 100000000):
        val = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(200)) # generate ramdom string of size 200
        d[i] = val
        if i % 10000 == 0:
            print i

When I monitored the process via ps command, it turned out to be that the %MEM is increased to 4.8 and never changed when both cgroups service is on and off:

$ ps aux | grep mem_intensive.py
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
jason    11531 88.5  4.8 3312972 3191236 pts/0 R+   22:16   0:07 python mem_intensive.py

In this scenario, total memory is 62GB, thus 4.8% of it about 3GB. I set the limit to be 4GB without any other processes running on this user.

So could anyone give me some idea about this problematic python script? Thanks in advance.

Marinmarina answered 28/5, 2015 at 14:21 Comment(8)

Is your range(0, 100000000) using a generator? Might try xrange instead in Python 2.7, if Python 3 should be a generator already. Just a thought, constructing this large range in memory might be part of the issue. – Cristie 28/5, 2015 at 14:33

does the scipt end? perhap just try an infinite loop? – Alizaalizarin 28/5, 2015 at 14:44

When I change from range to xrange, the %mem doesn't grow (or at a very slow pace), could you explain why, please? @PaulJoireman – Marinmarina 28/5, 2015 at 14:50

take a look in this question #6318318. I've changed the val = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(200)) with val = ''.join(random.choice(string.ascii_uppercase + string.digits)) * 1024000 and the memory crossed the 4G successfully – Hersch 28/5, 2015 at 15:6

What cgroups parameters did you changed, and what aspects do you wanna test in your script? – Refusal 28/5, 2015 at 15:41

For now, cgroups is not the key point of the problem. I'm just curious why memory stops increasing while the python script is still running. From my perspective, memory should grow linearly as the dict becomes bigger and bigger. @Refusal – Marinmarina 29/5, 2015 at 0:40

Changing from 200 to 1024000 will definitely address the problem for not being able to increase memory over 4GB. But it covers up the phenomenon exhibited by this python script. Why wouldn't memory grow linearly as the dict becomes bigger? @YuriG. – Marinmarina 29/5, 2015 at 0:44

yes, it grows linearly, just don't use random, see my answer – Alizaalizarin 29/5, 2015 at 9:41

I've played a bit with your script, and it keeps growing, albeit slowly. The bottleneck is using random.choice. If you want to fill memory fast, generating randomness works against you. So just using fixed strings does exhaust the memory rather quickly. If using the following, while wanting to watch how it grows you'd probably throw a time.sleep() after your print:

if __name__ == '__main__':
    d = {}
    i = 0;
    for i in range(0, 100000000):
        d[i] = 'A'*1024
        if i % 10000 == 0:
            print(i)

filling memory faster:

just an one-liner:

['A'*1024 for _ in xrange(0, 1024*1024*1024)]

Alizaalizarin answered 29/5, 2015 at 9:40 Comment(0)

If you want to see whether cgroup works, just set the limit to 100MB and try to start the script. The point isn't to see whether a large limit works better or worse than a small one - you just want to make sure that a limit is enforced. For that, a small limit is enough.

To make sure that the dict grows as expected, you can print it's size using the answers to this question: Memory-usage of dictionary in Python?

Damales answered 29/5, 2015 at 7:38 Comment(0)

range constructs a list in memory which your loop then iterates through, xrange creates a generator which is an object which feeds the loop like a sequence but does not build that sequence in memory. There is little difference between range and xrange for short ranges but significant differences for large ranges, quoting the Python docs: https://docs.python.org/2/library/functions.html#xrange

In Python 3, the functionality provided by xrange becomes the default for the range built-in. As a result of this, and the inherent memory advantage of xrange in Python 2, I've seen Python 2 to 3 compatibility layers map the Python 2 range function to call xrange instead, under the hood.

Cristie answered 29/5, 2015 at 2:40 Comment(1)

Doesn't matter since he materializes the result in a string of 200 characters and then puts those into a map. – Damales 29/5, 2015 at 7:37

Recommended topics

Hot tags