Q-Learning stand-still

Q-learning is a model-free reinforcement learning algorithm that teaches an agent to assign values to each action it might take, conditioned on the agent being in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. [wiki]

The opponent of the agent to be trained is 'stand-stll'. That means it does not reat to the agent and also it does not move on its own.

The agent has to learn to: - Stay in the field. - Push the opponent as fast as possible out of the field.

Q-CV4 Cross validation on all Parameters 20k epochs
Q-CV5 Cross validation reward unlimited
Q-CV7 Cross validation reward at end
QMAP01 Cross validation mapping
QMAP02 Cross validation mapping
QMAP03 Cross validation mapping 2k epochs
QMAP04 Cross validation mapping 10k epochs
QMAP05 Cross validation mapping 2k epochs
QMAP06 Cross validation mapping 5k epochs
QMAP08 Check variance of equal parameters
QRW04 Compare reward handler, reduced push reward
QRW05 Compare reward handlers with speed-bonus
QRW06 Cross validation for speed-bonus (L,E)
QRW07 Cross validation for speed-bonus (L,E) take 2
QLOW00 Very low values for L and E test
QLOW03 Very low values for L and E 30k epochs
QLOW04 Very low values for L, E 1k epochs
QLOW05 Very low values for L E wit D 1k epochs
QSEE00 Zero reward for 'seeing' as a baseline 1k epochs
QSEE02 Can see. 1k epochs
QSEE03 Can see. 5k epochs
QSEE06 Can see. 20k epochs
QSEE07 Can see. 10k epochs small E
QFETCH04 Lazy fetch 20k
QFETCH05 Lazy fetch top values 20k
QED5 Epsilon decay 10k
QED6 Epsilon decay long decay 10k
QE7 Epsilon decay big decay 10k
QEDEXP1 Epsilon exponential decay 10k epochs
QEDEXP2 Epsilon exponential decay 20k epochs
QEDC1 Cross validation learning rate, epsilon and discount
QEDC2 Cross validation learning rate, epsilon and discount. Based on QEDC1
QRW104 Cross validation reward handler. Based on QEDC2 2k
QRW105 Cross validation reward handler. Based on QEDC2 20k