Cross validation over epsilon 5k

Number of epochs: 5000

Rewards are limited by a lower and upper limit. This might cause problems.

training.simrunner.RewardHandlerName.consider-all

Considers all sensors.
Calculates a reward after every step.

Positive rewarded actions.

Negative rewarded actions.

Rewards are calculated after every step and might be different from zero.

training.parallel.ParallelConfig.q-eps-0

L0
learning rate0.5
E0E1E2E3E4E5E6E7E8E9
epsilon0.00010.00050.0010.0050.010.050.10.30.50.6
D0
discount0.75

Results for: Q0-EPS0 L0E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E1D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E2D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E3D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E4D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E5D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E6D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E7D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E8D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS0 L0E9D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10