QRW05 Compare reward handlers with speed-bonus

Run multiple trainings with different reward handlers

The agent tends to stroll around and collect some extra reward by pushing the opponent inside the field. Increasing the 'speed-bonus' should reduce that problem.

The speed bonus was set from 50 tp 150

epoch count: 5000

training.simrunner.RewardHandlerName.continuous-consider-all

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.simrunner.RewardHandlerName.reduced-push-reward

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.simrunner.RewardHandlerName.speed-bonus

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.parallel.ParallelConfig.q-rw-1

L0 L1
learning rate 0.12 0.12
E0 E1
epsilon 0.015 0.015
D0 D1
discount 0.3 0.3
M0 M1
mapping non-linear-3 non-linear-3
R0 R1 R2
reward handler continuous-consider-all reduced-push-reward speed-bonus

L0 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D0 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E0 D1 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D0 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L0 E1 D1 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D0 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E0 D1 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D0 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M0 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M0 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M0 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M1 R0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M1 R1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
L1 E1 D1 M1 R2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10