How to parallelize with jupyter and sklearn?
Asked Answered
C

0

6

I'm trying to parallelize the GridSearchCV of scikit-learn. It's running on a jupyter (hub) notebook environment. After some research I found this code:

from sklearn.externals.joblib import Parallel, parallel_backend, register_parallel_backend
from ipyparallel import Client
from ipyparallel.joblib import IPythonParallelBackend

c = Client(profile='myprofile')
print(c.ids)
bview = c.load_balanced_view()

register_parallel_backend('ipyparallel', lambda : IPythonParallelBackend(view=bview))

grid = GridSearchCV(pipeline, cv=3, n_jobs=4, param_grid=param_grid)

with parallel_backend('ipyparallel'):
    grid.fit(X_train, Y_train)

Note that I've set the n_jobs parameter to 4, what is the number of machine's cpu cores. (It's what nproc returns)

But it doesn't seem to work: ImportError: cannot import name 'register_parallel_backend', although I installed joblib with conda install joblib and also tried pip install -U joblib.

So, what's the best way to parallelize the GridSearchCV in this environment?

UPDATE:

Without ipyparallel and just setting the n_jobs parameter:

grid = GridSearchCV(pipeline, cv=3, n_jobs=4, param_grid=param_grid)
grid.fit(X_train, Y_train)

Result is the following warning message:

/opt/conda/lib/python3.5/site-  packages/sklearn/externals/joblib/parallel.py:540: UserWarning:

Multiprocessing-backed parallel loops cannot be nested, setting n_jobs=1

Seems like it ends up in sequential execution rather than parallel execution.

Centenarian answered 14/1, 2017 at 10:53 Comment(4)
I think n_jobs=-1 would lauch all the cpu core to parallelParadis
@AlexanderYau: Just setting the parameter throws an error message, I've updated the post.Centenarian
How many cpu cores in your machine?Paradis
@AlexanderYau Exactly 4 cpu cores.Centenarian

© 2022 - 2024 — McMap. All rights reserved.