I'm trying to determine how to correctly use cProfile and pstats with asyncio code. I am profiling my script by running cProfile.run('loop.run_until_complete(main())', 'profile.stats')
.
After using pstats
to sort by SortKeys.TIME
, I get the following output:
In [9]: sorted_stats.print_stats()
Fri Feb 15 14:40:18 2019 profile.txt
67411199 function calls (67230882 primitive calls) in 305.899 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
3617 273.162 0.076 273.162 0.076 {method 'poll' of 'select.epoll' objects}
14094092/14093931 1.527 0.000 1.904 0.000 {built-in method builtins.isinstance}
44901 1.166 0.000 1.166 0.000 {built-in method posix.stat}
784052 1.028 0.000 1.479 0.000 /usr/lib/python3/dist-packages/soupsieve/util.py:41(lower)
32070 0.887 0.000 1.565 0.000 /usr/lib/python3/dist-packages/bs4/element.py:620(<genexpr>)
4554545 0.729 0.000 0.756 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1331(descendants)
1026560 0.726 0.000 3.358 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1502(search)
448385 0.716 0.000 1.411 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1528(_matches)
104 0.689 0.007 4.771 0.046 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:283(feed)
255143 0.655 0.000 1.575 0.000 /usr/lib/python3/dist-packages/bs4/__init__.py:391(endData)
137377 0.520 0.000 3.692 0.000 /usr/lib/python3/dist-packages/html5lib/_tokenizer.py:49(__iter__)
8574 0.510 0.000 2.392 0.000 /usr/lib/python3.7/traceback.py:318(extract)
167700 0.494 0.000 1.061 0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:199(charsUntil)
53607 0.487 0.000 0.589 0.000 /usr/lib/python3/dist-packages/bs4/builder/__init__.py:158(_replace_cdata_list_attribute_values)
299357 0.485 0.000 0.485 0.000 /usr/lib/python3/dist-packages/bs4/element.py:258(setup)
22253 0.467 0.000 5.928 0.000 /usr/lib/python3/dist-packages/bs4/element.py:592(_find_all)
448385 0.437 0.000 2.225 0.000 /usr/lib/python3/dist-packages/bs4/element.py:1461(search_tag)
83147 0.424 0.000 2.365 0.000 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:176(start)
1293015 0.418 0.000 0.556 0.000 /usr/lib/python3/dist-packages/soupsieve/css_match.py:94(is_tag)
4824419 0.409 0.000 0.409 0.000 {method 'append' of 'list' objects}
73068 0.404 0.000 0.634 0.000 /home/jess/.local/lib/python3.7/site-packages/rfc3986/normalizers.py:127(encode_component)
868978 0.403 0.000 0.773 0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:192(char)
454702 0.385 0.000 0.385 0.000 {method 'match' of 're.Pattern' objects}
2326981 0.374 0.000 0.415 0.000 {built-in method builtins.hasattr}
52 0.334 0.006 5.732 0.110 /usr/lib/python3/dist-packages/html5lib/html5parser.py:196(mainLoop)
... ... ... ... ... ...
As you can see, almost all of the execution time is showing up as {method 'poll' of 'select.epoll' objects}
.
I don't have enough experience with cProfile
or asyncio
to know how to interpret this result. I can't tell if stats are showing my program spending the majority of execution time in poll()
because (a) cProfile doesn't work well with asyncio
and isn't giving correct stats, (b) this is normal behavior, and I should just ignore the first line or (c) this is an indicator of some sort of bug in my program (which is indeed terribly slow right now for reasons I have yet to determine) ...
Is it normal for async processes to spend this much execution time in poll()
? Is there a better way to profile async code, either with cProfile or something else, that will give me more accurate results? Or is this a sign that there is likely some sort of bug in my code causing this?
NOTE: for context, the code I was profiling is basically fetching 50 pages, processing the HTML, and then doing some text/language processing. The fetch()
part could explain a good portion of the time in poll()
. But this on a fast internet connection (> 5MB /sec) so I don't think it should be spending that % of the time fetching just a few dozen text/html
pages. It seems like something else is going on, or the stats aren't correct ...
cProfile
in this context (although I've never tried using it withasyncio
). – Formosaasyncio
and see how that performs? – Pisciformasyncio
code: roguelynn.com/words/asyncio-profiling – Hugues