Q-Learning
Q-learning is a model-free reinforcement learning algorithm
that teaches an agent to assign values to each action it
might take, conditioned on the agent being in a
particular state. It does not require a model of the environment
(hence "model-free"), and it can handle problems with stochastic
transitions and rewards without requiring adaptations. [wiki]
QMAP01 Cross validation mapping
QMAP02 Cross validation mapping
QMAP03 Cross validation mapping 2k epochs
QMAP04 Cross validation mapping 10k epochs
QMAP05 Cross validation mapping 2k epochs
QMAP06 Cross validation mapping 5k epochs
QRW07 Cross validation for speed-bonus (L,E) take 2
QLOW00 Very low values for L and E test
QLOW03 Very low values for L and E 30k epochs
QLOW04 Very low values for L, E 1k epochs
QLOW05 Very low values for L E wit D 1k epochs
QSEE00 Zero reward for 'seeing' as a baseline 1k epochs