I'm studying Reinforcement Learning and reading Sutton's book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I'm reading about policy gradient methods and genetic algorithms for the resolution of decision problems. I have never had experience before in this topic and I'm having problems understanding when a technique should be preferred over another. I have a few ideas, but I'm not sure about them. Can someone briefly explain or tell me a source where I can find something about typical situation where a certain methods should be used? As far as I understand:
- Dynamic Programming and Linear Programming should be used only when the MDP has few actions and states and the model is known, since it's very expensive. But when DP is better than LP?
- Monte Carlo methods are used when I don't have the model of the problem but I can generate samples. It does not have bias but has high variance.
- Temporal Difference methods should be used when MC methods need too many samples to have low variance. But when should I use TD and when Q-Learning?
- Policy Gradient and Genetic algorithms are good for continuous MDPs. But when one is better than the other?
More precisely, I think that to choose a learning methods a programmer should ask himlself the following questions:
- does the agent learn online or offline?
- can we separate exploring and exploiting phases?
- can we perform enough exploration?
- is the horizon of the MDP finite or infinite?
- are states and actions continuous?
But I don't know how these details of the problem affect the choice of a learning method. I hope that some programmer has already had some experience about RL methods and can help me to better understand their applications.