How do I profile memory usage in Python?
Asked Answered



I've recently become interested in algorithms and have begun exploring them by writing a naive implementation and then optimizing it in various ways.

I'm already familiar with the standard Python module for profiling runtime (for most things I've found the timeit magic function in IPython to be sufficient), but I'm also interested in memory usage so I can explore those tradeoffs as well (e.g. the cost of caching a table of previously computed values versus recomputing them as needed). Is there a module that will profile the memory usage of a given function for me?

Attic answered 16/2, 2009 at 9:34 Comment(1)
Duplicate of Which Python memory profiler is recommended?. IMHO best answer in 2019 is memory_profilerReligiose

This one has been answered already here: Python memory profiler

Basically you do something like that (cited from Guppy-PE):

>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  25773  53  1612820  49   1612820  49 str
     1  11699  24   483960  15   2096780  64 tuple
     2    174   0   241584   7   2338364  72 dict of module
     3   3478   7   222592   7   2560956  78 types.CodeType
     4   3296   7   184576   6   2745532  84 function
     5    401   1   175112   5   2920644  89 dict of class
     6    108   0    81888   3   3002532  92 dict (no owner)
     7    114   0    79632   2   3082164  94 dict of type
     8    117   0    51336   2   3133500  96 type
     9    667   1    24012   1   3157512  97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  33      136  77       136  77 dict (no owner)
     1      1  33       28  16       164  93 list
     2      1  33       12   7       176 100 int
>>> x=[]
>>> h.iso(x).sp
 0: h.Root.i0_modules['__main__'].__dict__['x']
Cupellation answered 16/2, 2009 at 10:0 Comment(6)
Official guppy documentation is a bit minimial; for other resources see this example and the heapy essay.Trichromat
@Nit By downgraded you mean down-voted? That doesn't seem fair because it was valuable at one point in time. I think an edit at the top stating it is no longer valid for X reason and to see answer Y or Z instead. I think this course of action is more appropriate.Prevocalic
Sure, that works, too, but somehow it would be nice if the accepted and highest voted answer involved a solution that still works and is maintained.Nit
h.heap() very slow after I import some other packages.Olva
Only available for Python 2Leatherman
You can try a version for python3:

Python 3.4 includes a new module: tracemalloc. It provides detailed statistics about which code is allocating the most memory. Here's an example that displays the top three lines allocating memory.

from collections import Counter
import linecache
import os
import tracemalloc

def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/" with "module/"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


counts = Counter()
fname = '/usr/share/dict/american-english'
with open(fname) as words:
    words = list(words)
    for word in words:
        prefix = word[:3]
        counts[prefix] += 1
print('Top prefixes:', counts.most_common(3))

snapshot = tracemalloc.take_snapshot()

And here are the results:

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: scratches/ 6527.1 KiB
    words = list(words)
#2: scratches/ 247.7 KiB
    prefix = word[:3]
#3: scratches/ 193.0 KiB
    counts[prefix] += 1
4 other: 4.3 KiB
Total allocated size: 6972.1 KiB

When is a memory leak not a leak?

That example is great when the memory is still being held at the end of the calculation, but sometimes you have code that allocates a lot of memory and then releases it all. It's not technically a memory leak, but it's using more memory than you think it should. How can you track memory usage when it all gets released? If it's your code, you can probably add some debugging code to take snapshots while it's running. If not, you can start a background thread to monitor memory usage while the main thread runs.

Here's the previous example where the code has all been moved into the count_prefixes() function. When that function returns, all the memory is released. I also added some sleep() calls to simulate a long-running calculation.

from collections import Counter
import linecache
import os
import tracemalloc
from time import sleep

def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common

def main():

    most_common = count_prefixes()
    print('Top prefixes:', most_common)

    snapshot = tracemalloc.take_snapshot()

def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/" with "module/"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


When I run that version, the memory usage has gone from 6MB down to 4KB, because the function released all its memory when it finished.

Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
Top 3 lines
#1: collections/ 0.7 KiB
    self.update(*args, **kwds)
#2: collections/ 0.6 KiB
    return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
#3: python3.6/ 0.5 KiB
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
10 other: 2.2 KiB
Total allocated size: 4.0 KiB

Now here's a version inspired by another answer that starts a second thread to monitor memory usage.

from collections import Counter
import linecache
import os
import tracemalloc
from datetime import datetime
from queue import Queue, Empty
from resource import getrusage, RUSAGE_SELF
from threading import Thread
from time import sleep

def memory_monitor(command_queue: Queue, poll_interval=1):
    old_max = 0
    snapshot = None
    while True:
            if snapshot is not None:

        except Empty:
            max_rss = getrusage(RUSAGE_SELF).ru_maxrss
            if max_rss > old_max:
                old_max = max_rss
                snapshot = tracemalloc.take_snapshot()
                print(, 'max RSS', max_rss)

def count_prefixes():
    sleep(2)  # Start up time.
    counts = Counter()
    fname = '/usr/share/dict/american-english'
    with open(fname) as words:
        words = list(words)
        for word in words:
            prefix = word[:3]
            counts[prefix] += 1
    most_common = counts.most_common(3)
    sleep(3)  # Shut down time.
    return most_common

def main():
    queue = Queue()
    poll_interval = 0.1
    monitor_thread = Thread(target=memory_monitor, args=(queue, poll_interval))
        most_common = count_prefixes()
        print('Top prefixes:', most_common)

def display_top(snapshot, key_type='lineno', limit=3):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        # replace "/path/to/module/" with "module/"
        filename = os.sep.join(frame.filename.split(os.sep)[-2:])
        print("#%s: %s:%s: %.1f KiB"
              % (index, filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))


The resource module lets you check the current memory usage, and save the snapshot from the peak memory usage. The queue lets the main thread tell the memory monitor thread when to print its report and shut down. When it runs, it shows the memory being used by the list() call:

2018-05-29 10:34:34.441334 max RSS 10188
2018-05-29 10:34:36.475707 max RSS 23588
2018-05-29 10:34:36.616524 max RSS 38104
2018-05-29 10:34:36.772978 max RSS 45924
2018-05-29 10:34:36.929688 max RSS 46824
2018-05-29 10:34:37.087554 max RSS 46852
Top prefixes: [('con', 1220), ('dis', 1002), ('pro', 809)]
2018-05-29 10:34:56.281262
Top 3 lines
#1: scratches/ 6527.0 KiB
    words = list(words)
#2: scratches/ 16.4 KiB
    prefix = word[:3]
#3: scratches/ 10.1 KiB
    counts[prefix] += 1
19 other: 10.8 KiB
Total allocated size: 6564.3 KiB

If you're on Linux, you may find /proc/self/statm more useful than the resource module.

Gower answered 14/8, 2017 at 16:31 Comment(8)
This is great, but it seems to only print the snapshots during intervals when functions inside "count_prefixes()" return. In other words, if you have some long running call, e.g. long_running() inside the count_prefixes() function, the max RSS values will not be printed until long_running() returns. Or am I mistaken?Nit
I think you're mistaken, @robguinness. memory_monitor() is running on a separate thread from count_prefixes(), so the only ways that one can affect the other are the GIL and the message queue that I pass to memory_monitor(). I suspect that when count_prefixes() calls sleep(), it encourages the thread context to switch. If your long_running() isn't actually taking very long, then the thread context might not switch until you hit the sleep() call back in count_prefixes(). If that doesn't make sense, post a new question and link to it from here.Gower
Thanks. I will post a new question and add a link here. (I need to work up an example of the issue I am having, since I can't share the proprietary parts of the code.)Nit
tracemalloc is really awesome, but unfortunately it only accounts for memory allocated by python, so if you have some c/c++ extension that does it own allocations, tracemalloc won't report it.Grasso
@Grasso That's not entirely true anymore, for example numpy arrays will appear in tracemalloc output since version 1.13 and Python 3.6.
@tgbrooks, do you know if they do something special for tracemalloc to register these? Perhaps they malloc using python API? I wish it worked for everything. e.g. pytorch cpp extensions that do their own malloc are not registered by tracemalloc. It makes sense since in those situations tracemalloc has no idea some memory allocation happened.Grasso
@Grasso I assume they have to, but I don't know the details. From the link I gave, it sounds like they have to do something specific when allocating memory in C for it to be counted.Sankaran
I tried. It looks like stack on line snapshot = snapshot.filter_traces((Richma

This one has been answered already here: Python memory profiler

Basically you do something like that (cited from Guppy-PE):

>>> from guppy import hpy; h=hpy()
>>> h.heap()
Partition of a set of 48477 objects. Total size = 3265516 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  25773  53  1612820  49   1612820  49 str
     1  11699  24   483960  15   2096780  64 tuple
     2    174   0   241584   7   2338364  72 dict of module
     3   3478   7   222592   7   2560956  78 types.CodeType
     4   3296   7   184576   6   2745532  84 function
     5    401   1   175112   5   2920644  89 dict of class
     6    108   0    81888   3   3002532  92 dict (no owner)
     7    114   0    79632   2   3082164  94 dict of type
     8    117   0    51336   2   3133500  96 type
     9    667   1    24012   1   3157512  97 __builtin__.wrapper_descriptor
<76 more rows. Type e.g. '_.more' to view.>
>>> h.iso(1,[],{})
Partition of a set of 3 objects. Total size = 176 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  33      136  77       136  77 dict (no owner)
     1      1  33       28  16       164  93 list
     2      1  33       12   7       176 100 int
>>> x=[]
>>> h.iso(x).sp
 0: h.Root.i0_modules['__main__'].__dict__['x']
Cupellation answered 16/2, 2009 at 10:0 Comment(6)
Official guppy documentation is a bit minimial; for other resources see this example and the heapy essay.Trichromat
@Nit By downgraded you mean down-voted? That doesn't seem fair because it was valuable at one point in time. I think an edit at the top stating it is no longer valid for X reason and to see answer Y or Z instead. I think this course of action is more appropriate.Prevocalic
Sure, that works, too, but somehow it would be nice if the accepted and highest voted answer involved a solution that still works and is maintained.Nit
h.heap() very slow after I import some other packages.Olva
Only available for Python 2Leatherman
You can try a version for python3:


  • Applicable on Linux only
  • Reports memory used by the current process as a whole, not individual functions within

But nice because of its simplicity:

import resource
def using(point=""):
    return '''%s: usertime=%s systime=%s mem=%s mb
                usage[2]/1024.0 )

Just insert using("Label") where you want to see what's going on. For example

wrk = ["wasting mem"] * 1000000

>>> before: usertime=2.117053 systime=1.703466 mem=53.97265625 mb
>>> after: usertime=2.12023 systime=1.70708 mem=60.8828125 mb
Flood answered 16/3, 2013 at 11:19 Comment(5)
"memory usage of a given function" so your approach is not helping.Tribromoethanol
By looking at usage[2] you are looking at ru_maxrss, which is only the portion of the process which is resident. This won't help much if the process has been swapped to disk, even partially.Verda
resource is a Unix specific module that does not work under Windows.Rummy
The units of ru_maxrss (that is, usage[2]) are kB, not pages so there is no need to multiply that number by resource.getpagesize().Oops
This printed out nothing for me.Rideout

If you only want to look at the memory usage of an object, (answer to other question)

There is a module called Pympler which contains the asizeof module.

Use as follows:

from pympler import asizeof

Unlike sys.getsizeof, it works for your self-created objects.

>>> asizeof.asizeof(tuple('bcd'))
>>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'})
>>> asizeof.asizeof({})
>>> asizeof.asizeof({'foo':'bar'})
>>> asizeof.asizeof('foo')
>>> asizeof.asizeof(Bar())
>>> asizeof.asizeof(Bar().__dict__)
>>> help(asizeof.asizeof)
Help on function asizeof in module pympler.asizeof:

asizeof(*objs, **opts)
    Return the combined size in bytes of all objects passed as positional arguments.
Graciagracie answered 10/11, 2015 at 14:13 Comment(11)
Is this asizeof related to RSS?Oleum
Is this size in bytes?Oleum
@mousecoder: Which RSS at Web feeds? How?Graciagracie
Sorry, I can't resist: FFS, who came up with the name "Pympler"?Sasnett
@PProteus: it's supposed to remove pymples. Sure, it's "peculiar"Graciagracie
@Graciagracie Resident set size, though I can only find one mention of it in Pympler's source and that mention doesn't seem directly tied to asizeofArouse
@mousecoder the memory reported by asizeof can contribute to RSS, yes. I'm not sure what else you mean by "related to".Harken
@Oleum yes in bytes.Suzerainty
although an elegant on first sight, its not fast enough - try to measure 10M item dictionary. will last foreverSuzerainty
@ulkas: so the other applicable methods are faster? if so, they might like a bug report...Graciagracie
@Graciagracie its possible it may be very case specific. but for my usecase measuring one large multidimensional dictionary, i found tracemalloc solution below a magnitude fasterSuzerainty

Below is a simple function decorator which allows to track how much memory the process consumed before the function call, after the function call, and what is the difference:

import time
import os
import psutil
def elapsed_since(start):
    return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))
def get_process_memory():
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()
    return mem_info.rss
def profile(func):
    def wrapper(*args, **kwargs):
        mem_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        mem_after = get_process_memory()
        print("{}: memory before: {:,}, after: {:,}, consumed: {:,}; exec time: {}".format(
            mem_before, mem_after, mem_after - mem_before,
        return result
    return wrapper

Here is my blog which describes all the details. (archived link)

Dermoid answered 14/3, 2018 at 7:5 Comment(4)
it should be process.memory_info().rss not process.get_memory_info().rss, at least in ubuntu and python 3.6. related
You're right as to 3.x. My customer is using Python 2.7, not the newest version.Dermoid
is this in bytes, KB , MB , what?Christalchristalle
I hope the return value will be in bytes, source:….Armour

Since the accepted answer and also the next highest voted answer have, in my opinion, some problems, I'd like to offer one more answer that is based closely on Ihor B.'s answer with some small but important modifications.

This solution allows you to run profiling on either by wrapping a function call with the profile function and calling it, or by decorating your function/method with the @profile decorator.

The first technique is useful when you want to profile some third-party code without messing with its source, whereas the second technique is a bit "cleaner" and works better when you are don't mind modifying the source of the function/method you want to profile.

I've also modified the output, so that you get RSS, VMS, and shared memory. I don't care much about the "before" and "after" values, but only the delta, so I removed those (if you're comparing to Ihor B.'s answer).

Profiling code

import time
import os
import psutil
import inspect

def elapsed_since(start):
    #return time.strftime("%H:%M:%S", time.gmtime(time.time() - start))
    elapsed = time.time() - start
    if elapsed < 1:
        return str(round(elapsed*1000,2)) + "ms"
    if elapsed < 60:
        return str(round(elapsed, 2)) + "s"
    if elapsed < 3600:
        return str(round(elapsed/60, 2)) + "min"
        return str(round(elapsed / 3600, 2)) + "hrs"

def get_process_memory():
    process = psutil.Process(os.getpid())
    mi = process.memory_info()
    return mi.rss, mi.vms, mi.shared

def format_bytes(bytes):
    if abs(bytes) < 1000:
        return str(bytes)+"B"
    elif abs(bytes) < 1e6:
        return str(round(bytes/1e3,2)) + "kB"
    elif abs(bytes) < 1e9:
        return str(round(bytes / 1e6, 2)) + "MB"
        return str(round(bytes / 1e9, 2)) + "GB"

def profile(func, *args, **kwargs):
    def wrapper(*args, **kwargs):
        rss_before, vms_before, shared_before = get_process_memory()
        start = time.time()
        result = func(*args, **kwargs)
        elapsed_time = elapsed_since(start)
        rss_after, vms_after, shared_after = get_process_memory()
        print("Profiling: {:>20}  RSS: {:>8} | VMS: {:>8} | SHR {"
              ":>8} | time: {:>8}"
            .format("<" + func.__name__ + ">",
                    format_bytes(rss_after - rss_before),
                    format_bytes(vms_after - vms_before),
                    format_bytes(shared_after - shared_before),
        return result
    if inspect.isfunction(func):
        return wrapper
    elif inspect.ismethod(func):
        return wrapper(*args,**kwargs)

Example usage, assuming the above code is saved as

from profile import profile
from time import sleep
from sklearn import datasets # Just an example of 3rd party function call

# Method 1
run_profiling = profile(datasets.load_digits)
data = run_profiling()

# Method 2
def my_function():
    # do some stuff
    a_list = []
    for i in range(1,100000):
    return a_list

res = my_function()

This should result in output similar to the below:

Profiling:        <load_digits>  RSS:   5.07MB | VMS:   4.91MB | SHR  73.73kB | time:  89.99ms
Profiling:        <my_function>  RSS:   1.06MB | VMS:   1.35MB | SHR       0B | time:   8.43ms

A couple of important final notes:

  1. Keep in mind, this method of profiling is only going to be approximate, since lots of other stuff might be happening on the machine. Due to garbage collection and other factors, the deltas might even be zero.
  2. For some unknown reason, very short function calls (e.g. 1 or 2 ms) show up with zero memory usage. I suspect this is some limitation of the hardware/OS (tested on basic laptop with Linux) on how often memory statistics are updated.
  3. To keep the examples simple, I didn't use any function arguments, but they should work as one would expect, i.e. profile(my_function, arg) to profile my_function(arg)
Nit answered 14/11, 2018 at 13:42 Comment(1)
AttributeError: 'pmem' object has no attribute 'shared', python3.9Chemo

A simple example to calculate the memory usage of a block of codes / function using memory_profile, while returning result of the function:

import memory_profiler as mp

def fun(n):
    tmp = []
    for i in range(n):
    return "XXXXX"

calculate memory usage before running the code then calculate max usage during the code:

start_mem = mp.memory_usage(max_usage=True)
res = mp.memory_usage(proc=(fun, [100]), max_usage=True, retval=True) 
print('start mem', start_mem)
print('max mem', res[0][0])
print('used mem', res[0][0]-start_mem)
print('fun output', res[1])

calculate usage in sampling points while running function:

res = mp.memory_usage((fun, [100]), interval=.001, retval=True)
print('min mem', min(res[0]))
print('max mem', max(res[0]))
print('used mem', max(res[0])-min(res[0]))
print('fun output', res[1])

Credits: @skeept

Handicapped answered 28/4, 2020 at 4:18 Comment(0)

maybe it help:
<see additional>

pip install gprof2dot
sudo apt-get install graphviz

gprof2dot -f pstats profile_for_func1_001 | dot -Tpng -o profile.png

def profileit(name):
    def inner(func):
        def wrapper(*args, **kwargs):
            prof = cProfile.Profile()
            retval = prof.runcall(func, *args, **kwargs)
            # Note use of name from outer scope
            return retval
        return wrapper
    return inner

def func1(...)
Mervinmerwin answered 22/8, 2017 at 8:57 Comment(1)
This is not memory profiling.Hydrofoil

Different use cases require different tools.

Web applications suffer from memory leaks, and so you want tools that are good at catching that sort of thing. memory-profiler is a fine tool here, you can see that a particular line of code is responsible for increased memory usage.

For data processing, you want peak memory because the issue isn't leaks, the issue is just allocating lots of memory. Imagine if you have a single line of code that allocates a temporary array of 10GB and then immediately drops it; I've made mistakes like this. memory-profiler will never catch this, because the memory usage at start and end of line is the same. So you need a very different kind of profiler.

For the latter use case, relevant tools include Memray and Fil, both open source, and Sciagraph (commercial, but has free plan and also does CPU profiling).

Hydrofoil answered 22/6, 2023 at 16:49 Comment(0)

Check out and try Scalene. It is CPU, GPU and memory profiler. The source code is being actively mantained and the repo has more than 10K stars on Github.
The comparison between different profilers taken from their repo: Profilers comparison table

Check more details and options in the repository description.

Ankledeep answered 13/2, 2024 at 9:53 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.