ipyparallel's LoadBalancedView bloats memory, how can I avoid that?
Asked Answered
B

0

2

This issue may be related to https://github.com/ipython/ipyparallel/issues/207 which is also not marked as solved, yet.
I also opened this issue here https://github.com/ipython/ipyparallel/issues/286

I want to execute multiple tasks in parallel using python and ipyparallel in a jupyter notebook and using 4 local engines by executing ipcluster start in a local console. Besides that one can also use DirectView, I use LoadBalancedView to map a set of tasks. Each task takes around 0.2 seconds (can vary though) and each task does a MySQL query where it loads some data and then processes it.
Working with ~45000 tasks works fine, however, my memory grows really high. This is actually bad because I want to run another experiment with over 660000 tasks which I can't run anymore because it bloats up my memory limit of 16 GB and then the memory swapping on my local drive starts. However, when using the DirectView my memory grows relatively small and is never full. But I actually need LoadBalancedView.
Even when running a minimal working example without database query this happens (see below).

I am not perfectly familiar with the ipyparallel library but I've read something about logs and caches that the ipcontroler does which may cause this. I am still not sure if it is a bug or if I can change some settings to avoid my problem.

Running a MWE

For my Python 3.5.3 environment running on Windows 10 I use the following (recent) packages:

  • ipython 6.1.0
  • ipython_genutils 6.1.0
  • ipyparallel 6.0.2
  • jupyter 1.0.0
  • jupyter_client 4.4.0
  • jupyter_console 5.0.0
  • jupyter_core 4.2.0

I would like the following example to work for LoadBalancedView without the immense memory growth (if possible at all):

  • Start ipcluster start on a console
  • Run a jupyter notebook with the following three cells:

    <1st cell>
    import ipyparallel as ipp
    rc = ipp.Client()
    lview = rc.load_balanced_view()
    
    <2nd cell>
    %%px --local
    import time
    
    <3rd cell>
    def sleep_here(i):
        time.sleep(0.2)
        return 42
    
    amr = lview.map_async(sleep_here, range(660000))
    amr.wait_interactive()
    
Berger answered 20/8, 2017 at 11:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.