Run multiple trainings with different reward handlers
The agent tends to stroll around and collect some extra reward by pushing the opponent inside the field. Increasing the 'speed-bonus' should reduce that problem.
The speed bonus was set from 50 tp 150
epoch count: 5000
Considers all simulation events for calculating the reward.
Possible simulation events created for an agent:
After every simulation step:
At simulation end:
(t * x) is the 'speed bonus'
t = 1 - (s / max_s)
s: Number of steps when th simulation ended
max_s: Max number of steps for a simulation
Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.
Considers all simulation events for calculating the reward.
Possible simulation events created for an agent:
After every simulation step:
At simulation end:
(t * x) is the 'speed bonus'
t = 1 - (s / max_s)
s: Number of steps when th simulation ended
max_s: Max number of steps for a simulation
Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.
Considers all simulation events for calculating the reward.
Possible simulation events created for an agent:
After every simulation step:
At simulation end:
(t * x) is the 'speed bonus'
t = 1 - (s / max_s)
s: Number of steps when th simulation ended
max_s: Max number of steps for a simulation
Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.
| L0 | L1 |
---|---|---|
learning rate | 0.12 | 0.12 |
| E0 | E1 |
---|---|---|
epsilon | 0.015 | 0.015 |
| D0 | D1 |
---|---|---|
discount | 0.3 | 0.3 |
| M0 | M1 |
---|---|---|
mapping | non-linear-3 | non-linear-3 |
| R0 | R1 | R2 |
---|---|---|---|
reward handler | continuous-consider-all | reduced-push-reward | speed-bonus |