parallel/multithread differential evolution in python
H

4

5

I'm trying to model a biochemical process, and I structured my question as an optimization problem, that I solve using differential_evolution from scipy.
So far, so good, I'm pretty happy with the implementation of a simplified model with 15-19 parameters.
I expanded the model and now, with 32 parameters, is taking way too long. Not totally unexpected, but still an issue, hence the question.

I've seen:
- an almost identical question for R Parallel differential evolution
- and a github issue https://github.com/scipy/scipy/issues/4864 on the topic

but it would like to stay in python (the model is within a python pipeline), and the pull request did not lead to and officially accepted solution yet, although some options have been suggested.

Also, I can't parallelize the code within the function to be optimised because is a series of sequential calculations each requiring the result of the previous step. The ideal option would be to have something that evaluates some individuals in parallel and return them to the population.

Summing up:
- Is there any option within scipy that allows parallelization of differential_evolution that I dumbly overlooked? (Ideal solution)
- Is there a suggestion for an alternative algorithm in scipy that is either (way) faster in serial or possible to parallelize?
- Is there any other good package that offers parallelized differential evolution funtions? Or other applicable optimization methods?
- Sanity check: am I overloading DE with 32 parameter and I need to radically change approach?

PS
I'm a biologist, formal math/statistics isn't really my strenght, any formula-to-english translation would be hugely appreciated :)

PPS
As an extreme option I could try to migrate to R, but I can't code C/C++ or other languages.

Hydrargyrum answered 13/6, 2018 at 7:35 Comment(0)
H
4

Thanks to @jp2011 for pointing to pygmo

First, worth noting the difference from pygmo 1, since the fist link on google still directs to the older version.

Second, Multiprocessing island are available only for python 3.4+

Third, it works. The processes I started when I first asked the question are still running while I write, the pygmo archipelago running an extensive test of all the 18 possible DE variations present in saDE made in less than 3h. The compiled version using Numba as suggested here https://esa.github.io/pagmo2/docs/python/tutorials/coding_udp_simple.html will probably finish even earlier. Chapeau.

I personally find it a bit less intuitive than the scipy version, given the need to build a new class (vs a signle function in scipy) to define the problem but is probably just a personal preference. Also, the mutation/crossing over parameters are defined less clearly, for someone approaching DE for the first time might be a bit obscure.
But, since serial DE in scipy just isn't cutting it, welcome pygmo(2).

Additionally I found a couple other options claiming to parallelize DE. I didn't test them myself, but might be useful to someone stumbling on this question.

Platypus, focused on multiobjective evolutionary algorithms https://github.com/Project-Platypus/Platypus

Yabox
https://github.com/pablormier/yabox

from Yabox creator a detailed, yet IMHO crystal clear, explaination of DE https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/

Hydrargyrum answered 23/6, 2018 at 16:40 Comment(1)
Thank you for pointing to the Yabox author article, I should definitely read it. Just one remark: to achieve parallel execution with Yabox one should use PDE class with processes=N parameter instead of DE class (as in the README example) which is serialGilt
C
9

Scipy differential_evolution can now be used in parallel extremely easily, by specifying the workers:

workers int or map-like callable, optional

If workers is an int the population is subdivided into workers sections and evaluated in parallel (uses multiprocessing.Pool). Supply -1 to use all available CPU cores. Alternatively supply a map-like callable, such as multiprocessing.Pool.map for evaluating the population in parallel. This evaluation is carried out as workers(func, iterable). This option will override the updating keyword to updating='deferred' if workers != 1. Requires that func be pickleable.

New in version 1.2.0.

scipy.optimize.differential_evolution documentation

Contemporize answered 10/9, 2019 at 15:37 Comment(3)
this should really be the answer now !Scrivenor
freezes for me unfortunatlyOmnidirectional
@Omnidirectional you need to be careful how you use it, for example I tried using a parallel method for a task with lots of disk writes, which was the bottleneck and crashed everything!! What are you trying to do?Contemporize
H
4

Thanks to @jp2011 for pointing to pygmo

First, worth noting the difference from pygmo 1, since the fist link on google still directs to the older version.

Second, Multiprocessing island are available only for python 3.4+

Third, it works. The processes I started when I first asked the question are still running while I write, the pygmo archipelago running an extensive test of all the 18 possible DE variations present in saDE made in less than 3h. The compiled version using Numba as suggested here https://esa.github.io/pagmo2/docs/python/tutorials/coding_udp_simple.html will probably finish even earlier. Chapeau.

I personally find it a bit less intuitive than the scipy version, given the need to build a new class (vs a signle function in scipy) to define the problem but is probably just a personal preference. Also, the mutation/crossing over parameters are defined less clearly, for someone approaching DE for the first time might be a bit obscure.
But, since serial DE in scipy just isn't cutting it, welcome pygmo(2).

Additionally I found a couple other options claiming to parallelize DE. I didn't test them myself, but might be useful to someone stumbling on this question.

Platypus, focused on multiobjective evolutionary algorithms https://github.com/Project-Platypus/Platypus

Yabox
https://github.com/pablormier/yabox

from Yabox creator a detailed, yet IMHO crystal clear, explaination of DE https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/

Hydrargyrum answered 23/6, 2018 at 16:40 Comment(1)
Thank you for pointing to the Yabox author article, I should definitely read it. Just one remark: to achieve parallel execution with Yabox one should use PDE class with processes=N parameter instead of DE class (as in the README example) which is serialGilt
C
2

I've been having exactly the same problem. Perhaps, you could try pygmo, which does support different optimisation algorithms (including DE) and has a model for parallel computation. However, I'm finding that the community is not big as it is for scipy. Their tutorials, documentation, and examples are good quality and one can get things to work from that.

Concuss answered 15/6, 2018 at 9:8 Comment(4)
Somewhat less intuitive than scipy, but I'll give it a try! thanks!Hydrargyrum
I tried so hard to make pygmo work, but as you say, the community is very small. I hardly find my questions answered and keep finding similar documentation whatever I search for. Generally speaking, I still don't know why would someone prefer scipy over pygmo or visa versa. Do you have any thoughts? I was able to simply parallelize my DE in scipy, but after 2 weeks i am still tweaking my pygmo code to get something out of it, no success yet. It seems much complicated compared to scipy, as you need to explicitly define classes and deeply understand which one does what.Antinode
@user3015729, I totally agree on the more user-friendly interface of scipy, as a biologist with almost only self-taught coding skills the first approach to pygmo was a bit daunting. Additionally, pygmo only allows for random initiation of the initial population, while I find the latin hypercube option of scipy very nice. However when I posted the question I couldn't find a parallelization option on scipy, and it was a clear no-go given the size of my search space. I've seen your question, though, I'll give it a shot as soon as I have a moment.Hydrargyrum
@Antinode as part of the pygmo community (and one of two main developer) I can witness we are rather quick in answering questions via our gitter channel: gitter.im/pagmo2/Lobby Also, pygmo can initialize populations arbitrarily and not only randomly, via the push_back method of the population. As a last update, since version 2.10 parallelization happens both in fine grained and in coarse grained mode (island model).Tami
S
0

I suggest the batch mode of PyFDE. https://pythonhosted.org/PyFDE/tutorial.html#batch-mode In batch mode, the fitness function will be called only once per iteration to evaluate the fitness of all the population.

The example w/o the batch mode:

import pyfde
from math import cos, pi
import time
import numpy

t1=time.time()
def fitness(p):
    x, y = p[0], p[1]
    val = 20 + (x**2 - 10*cos(2*pi*x)) + (y**2 - 10*cos(2*pi*y))
    return -val
    
solver = pyfde.ClassicDE(fitness, n_dim=2, n_pop=40, limits=(-5.12, 5.12))
solver.cr, solver.f = 0.9, 0.45
best, fit = solver.run(n_it=150)
t2=time.time()
print("Estimates: ",best)
print("Normal mode elapsed time (s): ",t2-t1)

The batch mode example:

t1=time.time()
def vec_fitness(p,fit):
    x, y = numpy.array(p[:,0]), numpy.array(p[:,1])
    val = 20 + (x**2 - 10*numpy.cos(2*pi*x)) + (y**2 - 10*numpy.cos(2*pi*y))
    fit[:] = -val
    
solver = pyfde.ClassicDE(vec_fitness, n_dim=2, n_pop=40, limits=(-5.12, 5.12), batch=True)
solver.cr, solver.f = 0.9, 0.45
best, fit = solver.run(n_it=150)
t2=time.time()
print("Estimates: ",best)
print("Batch mode elapsed time (s): ",t2-t1)

The output is:

Estimates: [1.31380987e-09 1.12832169e-09]
Normal mode elapsed time (s): 0.015959978103637695

Estimates: [2.01733383e-10 1.23826873e-10]
Batch mode elapsed time (s): 0.006017446517944336

############################################################

It's 1.5x faster, but only for a simple question. You can see >10x faster for a complex question. The code runs on a single CPU core (no multi-processing), and the performance improvement comes from the use of vectorization and MIMD (multiple instruction, multiple data). Combining vectorization and parallel/multi-processing will result in a double-improvement.

Scintillate answered 3/5, 2021 at 22:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.