Cross validation over discount 2

Number of epochs: 10000

Rewards are limited by a lower and upper limit. This might cause problems.

training.simrunner.RewardHandlerName.consider-all

Considers all sensors.
Calculates a reward after every step.

Positive rewarded actions.

Negative rewarded actions.

Rewards are calculated after every step and might be different from zero.

training.parallel.ParallelConfig.q-disc-1

L0
learning rate0.5
E0
epsilon0.5
D0D1D2D3
discount0.40.50.60.7

Results for: Q0-DISC3 L0E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
video 12 video 13 video 14 video 15
video 16 video 17 video 18 video 19
video 20 video 21 video 22 video 23
video 24 video 25 video 26 video 27
Results for: Q0-DISC3 L0E0D1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
video 12 video 13 video 14 video 15
video 16 video 17 video 18 video 19
Results for: Q0-DISC3 L0E0D2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
video 12 video 13 video 14 video 15
video 16 video 17 video 18 video 19
video 20 video 21 video 22 video 23
video 24 video 25 video 26 video 27
Results for: Q0-DISC3 L0E0D3

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
video 12 video 13 video 14 video 15
video 16 video 17 video 18 video 19
video 20 video 21 video 22 video 23
video 24 video 25 video 26 video 27