Run multiple trainings with different reward handlers
The agents tended to push opponents as often as possible instead of pushing them as soon as possible. Reason was that the sum of rewards for just pushing was higher than for winning a game.
The push reward for the 'reduced-push-reward' reward handler is 0.1 times the reward for the 'continuous-consider-all' reward handler.
Considers all simulation events for calculating the reward.
Possible simulation events created for an agent:
After every simulation step:
At simulation end:
(t * x) is the 'speed bonus'
t = 1 - (s / max_s)
s: Number of steps when th simulation ended
max_s: Max number of steps for a simulation
Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.
Considers all simulation events for calculating the reward.
Possible simulation events created for an agent:
After every simulation step:
At simulation end:
(t * x) is the 'speed bonus'
t = 1 - (s / max_s)
s: Number of steps when th simulation ended
max_s: Max number of steps for a simulation
Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.
| L0 | L1 |
---|---|---|
learning rate | 0.12 | 0.12 |
| E0 | E1 |
---|---|---|
epsilon | 0.0015 | 0.015 |
| D0 | D1 |
---|---|---|
discount | 0.3 | 0.3 |
| M0 | M1 |
---|---|---|
mapping | non-linear-3 | non-linear-3 |
| R0 | R1 |
---|---|---|
reward handler | continuous-consider-all | reduced-push-reward |