Which Python memory profiler is recommended? [closed]
Asked Answered
B

8

766

I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory. Google search shows a commercial one is Python Memory Validator (Windows only).

And open source ones are PySizer and Heapy.

I haven't tried anyone, so I wanted to know which one is the best considering:

  1. Gives most details.

  2. I have to do least or no changes to my code.

Bridgehead answered 21/9, 2008 at 4:43 Comment(6)
For finding the sources of leaks I recommend objgraph.Chaqueta
@MikeiLL There is a place for questions like these: Software RecommendationsCarrier
This is happening often enough that we should be able to migrate one question to another forum instead.Masterson
One tip: If someone use gae to and want's to check memory usage - it's a big headache, because those tools didn't output nothing or event not started. If you want to test something small, move function that you want to test to separate file, and run this file alone.Hutcheson
I recommend pymplerKite
Check out memrayGlennglenna
R
313

guppy3 is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.

For Python 2.x, use Heapy.

Rhodic answered 21/9, 2008 at 11:45 Comment(14)
sadly doesn't seem to build or install in osx.. 10.4 at least.Nihilism
It builds on OS X 10.7.1 with homebrew, but sadly doesn't run :-(Merrimerriam
If you're on Python 2.7 you may need the trunk version of it: sourceforge.net/tracker/…, pip install https://guppy-pe.svn.sourceforge.net/svnroot/guppy-pe/trunk/guppyDandelion
Latest version (0.1.9) builds on Windows for Python 2.6 x64 but h.heap() call causes APPCRASH.Renatarenate
The heapy docs are... not good. But I found this blog post very helpful for getting started: smira.ru/wp-content/uploads/2011/08/heapy.htmlUnhappy
Heapy is by far the easiest heap profiler to run when attaching to a leaking python process with rfoo and works fine in a multithreaded app, and works nicely with pip using "pip install guppy" Usually the default view works, but hpy offers several views of the profile data including showing you use count by reference. The blog post linked by @JoeShaw is very helpful.Zaid
Note, heapy doesn't include memory allocated in python extensions. If anybody has worked out a mechanism to get heapy to include boost::python objects, it would be nice to see some examples!Colombi
As of 2014-07-06, guppy does not support Python 3.Relucent
@JamesSnyder Looks like the normal pip version (1.10) is now ok with python 2.7Waltner
Just installed fine with pip (python 2.7). I found that the problem I wanted to use it for (memory use continually increasing) disappears when I call h.heap(). Any ideas why this might be?Shadow
How is knowing that "str" is consuming the most memory in any way useful? That could be one of a million points in the code. Without knowing where those calls are made, the info provided here is useless.Ideogram
There is a fork of guppy that supports Python 3 called guppy3.Ulrich
Where to insert this pofiler code in our existing python code? Should it be at the end/beginning? How do we integrate this profiler code in our existing code for resource usage stats?Naze
My favorite docs for Heapy/guppy3 is the research paper that created it, especially §6.2 "Debugging approach": liu.diva-portal.org/smash/get/diva2:22287/FULLTEXT01Ulrich
P
501

My module memory_profiler is capable of printing a line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory, not an exhaustive analysis on allocated objects.

After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a
Postpone answered 14/5, 2012 at 22:51 Comment(18)
For my usecase - a simple image manipulation script, not a complex system, which happened to leave some cursors open - this was the best solution. Very simple to drop in and figure out what's going on, with minimal gunk added to your code. Perfect for quick fixes and probably great for other applications too.Vernalize
This is great. Is there any way to use it to collect memory usage per object? (as opposed to per line). Ideally from an IPython session with objects already in memory. If not, do you have any pointers on something along these lines?Precondition
It doesn't get memory usage of individual objects. For that task, guppy/heapy might be what you want.Postpone
I find memory_profiler to be really simple and easy to use. I want to do profiling per line and not per object. Thanks for writing.Restitution
@FabianPedregosa how dose memory_profiler handle loops, can it identifier loop iteration number?Walcott
It identifies loops only implicitly when it tries to report the line-by-line amount and it finds duplicated lines. In that case it will just take the max of all iterations.Postpone
I have tried to profile memory usage of python application that was using tensorflow in cpu mode depending on input image size and python -m memory_profiler example.py does not give me correct results, and mprof give me results similar to htop.Slavey
Does not seem to perform very well in CPU-intensive programsBurnham
@FabianPedregosa: How to specify the installation path? I want to install it on my another python folder. thxBasketwork
Same way as any other python package, pip install --target=/custom/path memory_profilerPostpone
I have tried memory_profiler but think it is not a good choice. It makes the program execution incredibly slow (approximately in my case as 30 times as slow).Godly
There is a constant overhead (per-line) in tracking memory consumption, so if your program is extremely long or has many fast for/while loops, then I would expect this to slow down significantly your program. In that case, the time-based (opposed to line-based) profiler might be better. This is run as mprof run <script>, see doc for more information.Postpone
memory profiler and heapy solve 2 different cases i guess, one is concerned with memory consumption per line while the other one goes along objectsTohubohu
@FabianPedregosa Does memory_profiler buffer its output? I may be doing something wrong, but it seems that rather than dump the profile for a function when it completes, it waits for the script to end.Colorfast
It does indeed wait until the script finishes. It would not be easy to do otherwise as the function could be called again, in which case memory_profiler will aggregate the results.Postpone
@FabianPedregosa Thanks for so useful and simple library! Though I'm being confused with the output - when I run mprof run test.py and then mprof plot I get different memory usage from line-by-line output vs over-time. Line-by-line I get maximum of 550MiB while from the plot I get maximum of 5000MiB. What can be the problem? Thanks!Frannie
For me memory profiler slowed down execution by roughly a factor 10! Note that I had large objects in the orders of a few GB. Otherwise cool tool.Gifford
This tool is no longer maintained.Raffo
R
313

guppy3 is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.

For Python 2.x, use Heapy.

Rhodic answered 21/9, 2008 at 11:45 Comment(14)
sadly doesn't seem to build or install in osx.. 10.4 at least.Nihilism
It builds on OS X 10.7.1 with homebrew, but sadly doesn't run :-(Merrimerriam
If you're on Python 2.7 you may need the trunk version of it: sourceforge.net/tracker/…, pip install https://guppy-pe.svn.sourceforge.net/svnroot/guppy-pe/trunk/guppyDandelion
Latest version (0.1.9) builds on Windows for Python 2.6 x64 but h.heap() call causes APPCRASH.Renatarenate
The heapy docs are... not good. But I found this blog post very helpful for getting started: smira.ru/wp-content/uploads/2011/08/heapy.htmlUnhappy
Heapy is by far the easiest heap profiler to run when attaching to a leaking python process with rfoo and works fine in a multithreaded app, and works nicely with pip using "pip install guppy" Usually the default view works, but hpy offers several views of the profile data including showing you use count by reference. The blog post linked by @JoeShaw is very helpful.Zaid
Note, heapy doesn't include memory allocated in python extensions. If anybody has worked out a mechanism to get heapy to include boost::python objects, it would be nice to see some examples!Colombi
As of 2014-07-06, guppy does not support Python 3.Relucent
@JamesSnyder Looks like the normal pip version (1.10) is now ok with python 2.7Waltner
Just installed fine with pip (python 2.7). I found that the problem I wanted to use it for (memory use continually increasing) disappears when I call h.heap(). Any ideas why this might be?Shadow
How is knowing that "str" is consuming the most memory in any way useful? That could be one of a million points in the code. Without knowing where those calls are made, the info provided here is useless.Ideogram
There is a fork of guppy that supports Python 3 called guppy3.Ulrich
Where to insert this pofiler code in our existing python code? Should it be at the end/beginning? How do we integrate this profiler code in our existing code for resource usage stats?Naze
My favorite docs for Heapy/guppy3 is the research paper that created it, especially §6.2 "Debugging approach": liu.diva-portal.org/smash/get/diva2:22287/FULLTEXT01Ulrich
D
84

I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

You import memdebug, and call memdebug.start. That's all.

I haven't tried PySizer or Heapy. I would appreciate others' reviews.

UPDATE

The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()
Direful answered 21/9, 2008 at 4:50 Comment(10)
but is it only for cherrypy, how to use it with a sinple script?Bridgehead
It is not for CherryPy. Think of CherryPy as a GUI toolkit.Direful
fwiw, the pysizer page pysizer.8325.org seems to recommend heapy, which it says is similarWoolsey
It looks as though your above code is for use with CherryPy 2.x. For CherryPy 3.x, remove the blocking=False from the cherrypy.engine.start() call.Neelon
There is a generic WSGI port of Dowser called Dozer, which you can use with other web servers as well: pypi.python.org/pypi/DozerUnhappy
cherrypy 3.1 removed cherrypy.server.quickstart(), so just use cherrypy.engine.start()Faubourg
I like and use dowser, but the problem for me is that the application I'm using it in gives you like 1000 graphs and it becomes a pain to find what is important, and after you do, the pain point may have so many graphs that the trace page doesn't even load properly. So it doesn't scale very well.Farfamed
It looks like aminus.net no longer exists. Some quick web searching found references to it that only indicated it existing on aminus.net websites. Telling Anaconda Prompt conda search dowser found nothing. I would conclude that Dowser is no longer easily available, and is surely not being maintained.Gimlet
this doesn't work in python 3. I get an obvious StringIO error.Papiamento
Be careful: the link in the answer to Dowser is taking me to very phony websites impersonating more or less respectable news sources...Kean
S
70

Consider the objgraph library (see this blog post for an example use case).

Schoenfelder answered 27/10, 2009 at 19:41 Comment(2)
objgraph helped me solve a memory leak issue I was facing today. objgraph.show_growth() was particularly usefulPhotofluorography
I, too, found objgraph really useful. You can do things like objgraph.by_type('dict') to understand where all of those unexpected dict objects are coming from.Rilke
G
19

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.

Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.

Greysun answered 11/3, 2013 at 14:17 Comment(0)
P
16

I'm developing a memory profiler for Python called memprof:

http://jmdana.github.io/memprof/

It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:

from memprof import memprof

And decorate your method using:

@memprof

This is an example on how the plots look like:

enter image description here

The project is hosted in GitHub:

https://github.com/jmdana/memprof

Politic answered 3/7, 2013 at 12:12 Comment(2)
How do I use it? What is a,b,c?Restitution
@Restitution a, b and c are the names of the variables. You can find the documentation at github.com/jmdana/memprof. If you have any questions please feel free to submit an issue in github or send an email to the mailing list that can be found in the documentation.Politic
M
12

I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser

Millham answered 25/10, 2011 at 21:31 Comment(0)
W
7

Try also the pytracemalloc project which provides the memory usage per Python line number.

EDIT (2014/04): It now has a Qt GUI to analyze snapshots.

Welltimed answered 4/9, 2013 at 22:56 Comment(1)
tracemalloc is now part of the python standard library. See docs.python.org/3/library/tracemalloc.htmlBeelzebub

© 2022 - 2024 — McMap. All rights reserved.