Q-CV4 Cross validation on all Parameters 20k epochs

number of epochs: 20k

Rewards are limited by a lower and upper limit. This might cause problems.

training.simrunner.RewardHandlerName.continuous-consider-all

Considers all simulation events for calculating the reward.

Possible simulation events created for an agent:

  1. After every simulation step:

  2. At simulation end:

(t * x) is the 'speed bonus'

t = 1 - (s / max_s)

s:     Number of steps when th simulation ended   
max_s: Max number of steps for a simulation

Means, the reward/penalty is higher the shorter the simulation ran. The agent gets a higher reward when fast pushing out the opponent, or a higher penalty when fast moving unforced out of the field.

training.parallel.ParallelConfig.q-cross-0

L0L1L2L3
learning rate0.50.70.10.2
E0E1E2E3
epsilon0.50.70.10.2
D0D1D2D3D4
discount0.20.70.80.90.99