Reinforcement Learning With Variable Actions
Asked Answered
S

3

16

All the reinforcement learning algorithms I've read about are usually applied to a single agent that has a fixed number of actions. Are there any reinforcement learning algorithms for making a decision while taking into account a variable number of actions? For example, how would you apply a RL algorithm in a computer game where a player controls N soldiers, and each soldier has a random number of actions based its condition? You can't formulate fixed number of actions for a global decision maker (i.e. "the general") because the available actions are continually changing as soldiers are created and killed. And you can't formulate a fixed number of actions at the soldier level, since the soldier's actions are conditional based on its immediate environment. If a soldier sees no opponents, then it might only be able to walk, whereas if it sees 10 opponents, then it has 10 new possible actions, attacking 1 of the 10 opponents.

Spinode answered 7/3, 2011 at 4:34 Comment(2)
Please, next time that you have an RL question, ask it on Artificial Intelligence SE. Similar questions to this one were also asked there. See e.g. this.Ulita
There are situations where the agents can face a set of possible actions, and where the actions sequence matters. How should we proceed in these cases?Possum
C
5

What you describe is nothing unusual. Reinforcement learning is a way of finding the value function of a Markov Decision Process. In an MDP, every state has its own set of actions. To proceed with reinforcement learning application, you have to clearly define what the states, actions, and rewards are in your problem.

Cephalization answered 28/7, 2011 at 21:46 Comment(0)
H
2

If you have a number of actions for each soldier that are available or not depending on some conditions, then you can still model this as selection from a fixed set of actions. For example:

  • Create a "utility value" for each of the full set of actions for each soldier
  • Choose the highest valued action, ignoring those actions that are not available at a given time

If you have multiple possible targets, then the same principle applies, except this time you model your utility function to take the target designation as an additional parameter, and run the evaluation function multiple times (one for each target). You pick the target that has the highest "attack utility".

Harbor answered 7/3, 2011 at 11:15 Comment(2)
Like I said, the soldiers have a variable number of actions as well. What do you mean by making the attack target a parameter?Spinode
I mean: make the RL algorithm take some information about the target or specific action you are considering as an extra input. Then you can apply it to multiple targets and/or actions as needed. You just re-run the algorithm with different target and/or action information for each one that you are considering.Harbor
H
0

In continuous domain action spaces, the policy NN often outputs the mean and/or the variance, from which you, then, sample the action, assuming it follows a certain distribution.

Homans answered 7/5, 2020 at 7:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.