I'm using pybrain to build an agent that learns chemotaxis (moving toward food based on a chemical signal). The agent is represented by a neural network, which should adjust its weights based on its distance from the food. The inputs are two sensor neurons and the outputs are two motor neurons that move the agent. Therefore, I have continuous states and actions. The reward is the inverse of the distance from the food.
This is the essence of my main loop:
task = ChemotaxisTask(ChemotaxisEnv(), MAX_STEPS)
module = buildNetwork(2,2,2)
learner = HillClimber(task, module, maxEvaluations=MAX_TRIALS, mustMinimize=True, storeAllEvaluations=True, storeAllEvaluated=True, verbose=False)
learner.learn()
The approaches I've tried are:
- Experiment with Q (doesn't work because I have continuous states/actions)
- Experiment with Reinforce/ENAC (gradient descent calculates no change)
- ContinuousExperiment with Reinforce/ENAC (see above)
- EpisodicExperiment with HillClimber (network weights do not change)
I've decided to try to work with the EpisodicExperiment, as it seems best suited to my experiment.
I can finally see the network weights changing, but my average fitness over time doesn't increase. What could I be doing wrong?
Here is a Gist repository of all my code: https://gist.github.com/4477624
Here is the pybrain documentation: http://pybrain.org/docs/index.html The learner documentation (e.g. Q, Reinforce, HillClimber) is at http://pybrain.org/docs/api/rl/learners.html.
The code itself is at https://github.com/pybrain/pybrain. The learners are in https://github.com/pybrain/pybrain/tree/master/pybrain/rl/learners and the experiments are in https://github.com/pybrain/pybrain/tree/master/pybrain/rl/experiments.
However, I'm using optimization learners with EpisodicExperiment; those are located in https://github.com/pybrain/pybrain/tree/master/pybrain/optimization.
I'm sure you can find your way through the documentation and code from there. Everything else I'm working with is in https://github.com/pybrain/pybrain/tree/master/pybrain/rl.