QRW04 Compare reward handler, reduced push reward

Run multiple trainings with different reward handlers

The agents tended to push opponents as often as possible instead of pushing them as soon as possible. Reason was that the sum of rewards for just pushing was higher than for winning a game.

The push reward for the 'reduced-push-reward' reward handler is 0.1 times the reward for the 'continuous-consider-all' reward handler.

training.simrunner.RewardHandlerName.continuous-consider-all

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.simrunner.RewardHandlerName.reduced-push-reward

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.parallel.ParallelConfig.q-rw-0

L0 L1
learning rate 0.12 0.12
E0 E1
epsilon 0.0015 0.015
D0 D1
discount 0.3 0.3
M0 M1
mapping non-linear-3 non-linear-3
R0 R1
reward handler continuous-consider-all reduced-push-reward

L0 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10