Q-Learning

Q-learning is a model-free reinforcement learning algorithm that teaches an agent to assign values to each action it might take, conditioned on the agent being in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. [wiki]

Try different discount factors 5k epochs
Try different learning rates 5k epochs
Try different epsilons 5k epochs
Try different discount factors 10k epochs
Try different learning rates 10k epochs
Try different epsilons 10k epochs
Cross validation on all Parameters 20k epochs
Cross validation reward unlimited