Q-Learning

Q-learning is a model-free reinforcement learning algorithm that teaches an agent to assign values to each action it might take, conditioned on the agent being in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. [wiki]

Q-CV4 Cross validation on all Parameters 20k epochs
Q-CV5 Cross validation reward unlimited
Q-CV7 Cross validation reward at end
QMAP01 Cross validation mapping
QMAP02 Cross validation mapping
QMAP03 Cross validation mapping 2k epochs
QMAP04 Cross validation mapping 10k epochs
QMAP05 Cross validation mapping 2k epochs
QMAP06 Cross validation mapping 5k epochs
QMAP08 Check variance of equal parameters
QRW04 Compare reward handler, reduced push reward
QRW05 Compare reward handlers with speed-bonus
QRW06 Cross validation for speed-bonus (L,E)
QRW07 Cross validation for speed-bonus (L,E) take 2
QLOW00 Very low values for L and E test
QLOW03 Very low values for L and E 30k epochs
QLOW04 Very low values for L, E 1k epochs
QLOW05 Very low values for L E wit D 1k epochs
QSEE00 Zero reward for 'seeing' as a baseline 1k epochs
QSEE02 Can see. 1k epochs
QSEE03 Can see. 5k epochs
QSEE06 Can see. 20k epochs
QSEE07 Can see. 10k epochs small E
QFETCH04 Lazy fetch 20k
QFETCH05 Lazy fetch top values 20k
QED5 Epsilon decay 10k
QED6 Epsilon decay long decay 10k
QED7 Epsilon decay big decay 10k