Run multiple trainings with equal parameters explore the variance of training results.
Considers all simulation events for calculating the reward.
Possible simulation events created for an agent:
After every simulation step:
At simulation end:
(t * x) is the 'speed bonus'
t = 1 - (s / max_s)
s: Number of steps when th simulation ended
max_s: Max number of steps for a simulation
Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.
| L0 | L1 | L2 |
---|---|---|---|
learning rate | 0.12 | 0.12 | 0.12 |
| E0 | E1 | E2
-------- | ----- | -----
epsilon | 0.015 | 0.015 | 0.015
| D0 | D1 | D2 |
---|---|---|---|
discount | 0.3 | 0.3 | 0.3 |
| M0 | M1 | M1 |
---|---|---|---|
mapping | non-linear-3 | non-linear-3 | non-linear-3 |