Profiling Python in a docker container is very slow (cProfile and pyinstrument)
Asked Answered
F

0

11

I'm trying to profile some very simple code (using both cProfile and pyinstrument). The code is:

sum(1 for e in range(1533939))

When running the code without the profiler active, it is very quick (~85ms). However when attempting to run the same code in a profiler, it suddenly takes almost 13 seconds.

I'm doing this (in a Jupyter notebook):

%%prun

sum(1 for e in range(1533939))

I figured the problem is the overhead caused by the numerous calls to "next" inside the generator expression, however, running the same experiment on my host machine (not inside the container) is not showing a slowdown when profiling.

Any idea why the profiler might be slowing this code down so much?

For the record, I'm using Jupyter's container "jupyter/scipy-notebook" as the base container.

Thanks!

Furmenty answered 5/6, 2020 at 17:20 Comment(5)
Any volumes attached to this container? And what is your host OS?Rina
Host OS is MacOS. The code directory is connected to my host.Furmenty
I would guess you are hitting the infamous thread forums.docker.com/t/… or issue github.com/docker/for-mac/issues/77. The profile might need some extra IO compared to the normal execution of your script. And so you are hitting this in profiling. Have your tried the instruction on the performance tuning for volume page yet?Rina
I am not sure but this is probably because of consuming extra resources as part of the selected Environment Execution process , just take an example you wrote a lambda function with simple arithmetic problem, it will execute with in seconds but same if you write any Glue Job or Jupyter Notebook It will run some Pyspark Jobs, Hadoop Execution, executes Mappers for this, which is irrelevant for small set of dataset, that's why Hadoop/BigData is preferrable for large DataSets only, similarly is the case with your solution.Virulence
Are you running the same version of python on both the inside and outside? Are you saying that doing something like python -c 'import cProfile; cProfile.run("sum(1 for e in range(1533939))")' directly is dramatically different between the two?Teeth

© 2022 - 2025 — McMap. All rights reserved.