Difference between Evolutionary Strategies and Reinforcement Learning?
Asked Answered
C

3

14

I am learning about the approach employed in Reinforcement Learning for robotics and I came across the concept of Evolutionary Strategies. But I couldn't understand how RL and ES are different. Can anyone please explain?

Carpetbag answered 14/11, 2018 at 19:36 Comment(0)
P
17

To my understanding, I know of two main ones.

1) Reinforcement learning uses the concept of one agent, and the agent learns by interacting with the environment in different ways. In evolutionary algorithms, they usually start with many "agents" and only the "strong ones survive" (the agents with characteristics that yield the lowest loss).

2) Reinforcement learning agent(s) learns both positive and negative actions, but evolutionary algorithms only learns the optimal, and the negative or suboptimal solution information are discarded and lost.

Example

You want to build an algorithm to regulate the temperature in the room.

The room is 15 °C, and you want it to be 23 °C.

Using Reinforcement learning, the agent will try a bunch of different actions to increase and decrease the temperature. Eventually, it learns that increasing the temperature yields a good reward. But it also learns that reducing the temperature will yield a bad reward.

For evolutionary algorithms, it initiates with a bunch of random agents that all have a preprogrammed set of actions it is going to do. Then the agents that has the "increase temperature" action survives, and moves onto the next generation. Eventually, only agents that increase the temperature survive and are deemed the best solution. However, the algorithm does not know what happens if you decrease the temperature.

TL;DR: RL is usually one agent, trying different actions, and learning and remembering all info (positive or negative). EM uses many agents that guess many actions, only the agents that have the optimal actions survive. Basically a brute force way to solve a problem.

Plaintiff answered 17/11, 2018 at 5:51 Comment(0)
D
3

Evolution Strategies optimization happens on a population level. An evolution strategy algorithm in an iterative fashion (i) samples a batch of candidate solutions from the search space (ii) evaluates them and (iii) discards the ones with low fitness values. The sampling for a new iteration (or generation) happens around the mean of the best scoring candidate solutions from the previous iteration. Doing so enables evolution strategies to direct the search towards a promising location in the search space.

Reinforcement learning requires the problem to be formulated as a Markov Decision Process (MDP). An RL agent optimizes its behavior (or policy) by maximizing a cumulative reward signal received on a transition from one state to another. Since the problem is abstracted as an MDP learning can happen on a step or episode level. Learning per step (or N steps) is done via temporal-Difference learning (TD) and per episode is done via Monte Carlo methods. So far I am talking about learning via action-value functions (learning the values of actions). Another way of learning is by optimizing the parameters of a neural network representing the policy of the agent directly via gradient ascent. This approach is introduced in the REINFORCE algorithm and the general approach known as policy-based RL.

For a comprehensive comparison check out this paper https://arxiv.org/pdf/2110.01411.pdf

Donetta answered 19/7, 2022 at 10:41 Comment(0)
E
2

I think the biggest difference between Evolutionary Strategies and Reinforcement Learning is that ES is a global optimization technique while RL is a local optimization technique. So RL can converge to a local optima converging faster while ES converges slower to a global minima.

Expiate answered 18/11, 2018 at 20:4 Comment(2)
Hi Shunyo, I just had a quick question regarding this. I was wondering why you say that ES can converge to global solutions while RL converges to local solutions. To my understanding, ES can only "guarantee" global solution in infinite time, but is good for discrete problems where the loss function is non-differentiable. RL solves the dynamic programming problem in optimal control that guarantees global optimum, given the objective function is convex in nature.Plaintiff
Hi Rui, I think that is a pertinent question. If the objective function is convex, the solution is unique and of course, RL would converge to the global solution. However, the problem is when the objective function is non-convex (which in many practical problems might be), where RL can get stuck on a local optima. ES on the other hand, just by virtue of sampling from a large population, would converge to the global solution more easily (of course not guaranteed). The work around is reward shaping, which is cumbersome and is more of an art than science.Expiate

© 2022 - 2024 — McMap. All rights reserved.