What is correct way to use cProfile with asyncio code?

Asked 15/2, 2019 at 23:15 Answered 4/7, 2024 at 4:15

python profiling python-asyncio epoll cprofile

I'm trying to determine how to correctly use cProfile and pstats with asyncio code. I am profiling my script by running cProfile.run('loop.run_until_complete(main())', 'profile.stats').

After using pstats to sort by SortKeys.TIME, I get the following output:

In [9]: sorted_stats.print_stats()
Fri Feb 15 14:40:18 2019    profile.txt

         67411199 function calls (67230882 primitive calls) in 305.899 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3617  273.162    0.076  273.162    0.076 {method 'poll' of 'select.epoll' objects}
14094092/14093931    1.527    0.000    1.904    0.000 {built-in method builtins.isinstance}
    44901    1.166    0.000    1.166    0.000 {built-in method posix.stat}
   784052    1.028    0.000    1.479    0.000 /usr/lib/python3/dist-packages/soupsieve/util.py:41(lower)
    32070    0.887    0.000    1.565    0.000 /usr/lib/python3/dist-packages/bs4/element.py:620(<genexpr>)
  4554545    0.729    0.000    0.756    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1331(descendants)
  1026560    0.726    0.000    3.358    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1502(search)
   448385    0.716    0.000    1.411    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1528(_matches)
      104    0.689    0.007    4.771    0.046 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:283(feed)
   255143    0.655    0.000    1.575    0.000 /usr/lib/python3/dist-packages/bs4/__init__.py:391(endData)
   137377    0.520    0.000    3.692    0.000 /usr/lib/python3/dist-packages/html5lib/_tokenizer.py:49(__iter__)
     8574    0.510    0.000    2.392    0.000 /usr/lib/python3.7/traceback.py:318(extract)
   167700    0.494    0.000    1.061    0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:199(charsUntil)
    53607    0.487    0.000    0.589    0.000 /usr/lib/python3/dist-packages/bs4/builder/__init__.py:158(_replace_cdata_list_attribute_values)
   299357    0.485    0.000    0.485    0.000 /usr/lib/python3/dist-packages/bs4/element.py:258(setup)
    22253    0.467    0.000    5.928    0.000 /usr/lib/python3/dist-packages/bs4/element.py:592(_find_all)
   448385    0.437    0.000    2.225    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1461(search_tag)
    83147    0.424    0.000    2.365    0.000 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:176(start)
  1293015    0.418    0.000    0.556    0.000 /usr/lib/python3/dist-packages/soupsieve/css_match.py:94(is_tag)
  4824419    0.409    0.000    0.409    0.000 {method 'append' of 'list' objects}
    73068    0.404    0.000    0.634    0.000 /home/jess/.local/lib/python3.7/site-packages/rfc3986/normalizers.py:127(encode_component)
   868978    0.403    0.000    0.773    0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:192(char)
   454702    0.385    0.000    0.385    0.000 {method 'match' of 're.Pattern' objects}
  2326981    0.374    0.000    0.415    0.000 {built-in method builtins.hasattr}
       52    0.334    0.006    5.732    0.110 /usr/lib/python3/dist-packages/html5lib/html5parser.py:196(mainLoop)


   ...       ...       ...      ...      ...          ...

As you can see, almost all of the execution time is showing up as {method 'poll' of 'select.epoll' objects}.

I don't have enough experience with cProfile or asyncio to know how to interpret this result. I can't tell if stats are showing my program spending the majority of execution time in poll() because (a) cProfile doesn't work well with asyncio and isn't giving correct stats, (b) this is normal behavior, and I should just ignore the first line or (c) this is an indicator of some sort of bug in my program (which is indeed terribly slow right now for reasons I have yet to determine) ...

Is it normal for async processes to spend this much execution time in poll()? Is there a better way to profile async code, either with cProfile or something else, that will give me more accurate results? Or is this a sign that there is likely some sort of bug in my code causing this?

NOTE: for context, the code I was profiling is basically fetching 50 pages, processing the HTML, and then doing some text/language processing. The fetch() part could explain a good portion of the time in poll(). But this on a fast internet connection (> 5MB /sec) so I don't think it should be spending that % of the time fetching just a few dozen text/html pages. It seems like something else is going on, or the stats aren't correct ...

Nealson answered 15/2, 2019 at 23:15 Comment(7)

It's entirely possible the code is spending most of its time polling—but impossible to tell without seeing the actual code. There's add-on called line_profiler that's might be useful more useful to you than cProfile in this context (although I've never tried using it with asyncio). – Formosa 16/2, 2019 at 0:4

Have you tried running just one instance (say, download a single page) without asyncio and see how that performs? – Pisciform 16/2, 2019 at 19:32

@Formosa the issue I'm having is that the application is several thousand lines of code, and the reason I'm trying to use the profiler in the first place is to find out where it is slow. But because the profiler is giving me the results above, I can't tell where in the program it is polling, and didn't feel like sharing the source for the entire application here was appropriate. What I'm asking is how to use the profiler to find out which part of my own code is polling, rather than a vague "it's in poll() somewhere" result that doesn't help me past "it's probably a function doing IO" – Nealson 16/2, 2019 at 19:37

@Pisciform the fetch function itself needs asyncio because it is using aiohttp.ClientSession.get() to fetch the pages, so I can't really test it without asyncio. However, the unit tests for fetch() show no problems for downloading individual pages. It's only when I am concurrently fetching/processing many pages that I run into issues. It seems like the problem isn't the speed for the individual fetches, but rather something elsewhere either when pages are being queued, processed, errors handled, etc ... I have no idea really, which is why I was trying to profile :( – Nealson 16/2, 2019 at 19:42

J. Taylor: Using the linked line-profiler would allow you to find out where it's slow at the line-of-code level, which is why I mentioned it. You can apply the profiling selectively -- i.e. to just the part of the code you're interested in. – Formosa 16/2, 2019 at 20:25

Yes, I meant to thank you for sharing the line profiler. Unfortunately, I'm still in the same situation of not knowing which line to profile, so it won't help me here (short of randomly throwing it all over the place wherever there is IO and hoping that I find the right line) ... but it will be a useful tool in the future for sure! I feel like I need something like a "profiler for IO operations", that can tell me where specifically the poll() is happening ... – Nealson 17/2, 2019 at 18:3

I found this link useful when trying to profile my own asyncio code: roguelynn.com/words/asyncio-profiling – Hugues 23/6, 2021 at 15:5

You can do like this to profile a async method.

import cProfile, pstats

async def profile_async_method():
 profiler = cProfile.Profile()
 profiler.enable()
 
 # Here aysn_test_method is just an example, instead of async_test_method you 
 # can call the method for which you want to do profiling.
 response = await aysn_test_method()

 profiler.disable()
 profiler.print_stats(sort='cumulative')

 #Below code is to add the stats to the file in human-readable format
 profiler.dump_stats('output.prof')
 stream = open('output.txt', 'w')
 stats = pstats.Stats('output.prof', stream=stream)
 stats.sort_stats('cumtime')
 stats.print_stats()

Then call the profile_async_method()

Skirret answered 24/1, 2024 at 5:40 Comment(0)

I know this question is a bit old and there are answers, but I was looking into this and found a different approach I worked with.

import cProfile
import pstats
import contextvars
import asyncio  # Using it here for the last example

profiler = contextvars.ContextVar('profiler')
# ContextVar object to store the profiler instance, which in return 
# allows it to be accessed across different asynchronous contexts.

def start_profiling():
    pr = cProfile.Profile()
    profiler.set(pr)
    pr.enable()

def stop_profiling():
    pr = profiler.get()
    pr.disable()
    return pr

# Start and Stop the profiler for data collection and storing 
# Retrieves the Profile object from the context variable.

async def profiled_function(func, *args, **kwargs):
    start_profiling()
    try:
        return await func(*args, **kwargs)
    finally:
        pr = stop_profiling()
        ps = pstats.Stats(pr)
        ps.sort_stats(pstats.SortKey.CUMULATIVE).print_stats(10) # amount to print in the output (top 10 function calls sorted by cumulative time)

# Example function to use the profiler with
async def async_function():
    await asyncio.sleep(10)
    print("Async function executed")

asyncio.run(profiled_function(async_function))
# Call your function here --> ^^^^^^^^^^^^^^

after that, I installed snakeviz so I ran it with the output.
Run the test with:

python -m cProfile -o output_file_name.profile .\dest\to\your\file.py

snakeviz output_file_name.profile

Exertion answered 4/7, 2024 at 4:15 Comment(0)

-1

TL;DR use better tools

I found that it's more practical to profile something when it's reproducible, because then, any code change can be validated (if it improves performance or not).

If you can convert your code into a test case, using pytest, then please consider using https://pypi.org/project/pytest-profiling/, it's easy to set up, and the visual output is [relatively] easy to understand.

Another option is to use a statistical profiler, like https://pypi.org/project/pprofile/

Yet another, if you fancy profiling in production, is to use a service, like https://docs.datadoghq.com/tracing/profiler/

Finally, be clear about what you measure.

Most/many profiler allow to collect both wall clock time and cpu use time.

A naïve asyncio program is single-threaded, and given enough CPU share, is in one of two states at any given time: either crunching, or waiting. If your "problem" is too much crunching, then profiler is the answer, and, often, cpu time and wall time are roughly same. If the "problem" is actually waiting, then you probably want to look at "who is waiting for what". Shameless plug: use my https://pypi.org/project/awaitwhat/ for Python 3.7~3.9.

Sweatbox answered 19/10, 2021 at 3:17 Comment(0)

Recommended topics

Hot tags