QRW06 Cross validation for speed-bonus (L,E)

Problem QRW5: Still good solutions get lost after a time when using the 'speed-bonus' reward handler.

Goal: Find out if a more constant increase can be found

Run multiple trainings with the 'speed-bonus' reward handler and cross validate learning rate and epsilon. Use smaller values for learning rate and epsilon, as they promise more stability

epoch count: 10k

training.simrunner.RewardHandlerName.speed-bonus

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.parallel.ParallelConfig.q-rw-2

L0 L1 L1
learning rate 0.15 0.1 0.05
E0 E1 E2
epsilon 0.015 0.01 0.005
D0
discount 0.3
M0
mapping non-linear-3
R0 R1 R2
reward handler speed-bonus speed-bonus speed-bonus

L0 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E0 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E1 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E2 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E2 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E2 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E1 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E2 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E2 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E2 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E1 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E2 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E2 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E2 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11