Cross validation over discount 5k

Number of epochs: 5000

Rewards are limited by a lower and upper limit. This might cause problems.

training.simrunner.RewardHandlerName.consider-all

Considers all sensors.
Calculates a reward after every step.

Positive rewarded actions.

Negative rewarded actions.

Rewards are calculated after every step and might be different from zero.

training.parallel.ParallelConfig.q-disc-0

L0
learning rate0.1
E0
epsilon0.1
D0D1D2D3D4D5D6
discount0.60.650.70.750.80.850.9

Results for: Q0-DISC0 L0E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-DISC0 L0E0D1

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-DISC0 L0E0D2

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-DISC0 L0E0D3

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-DISC0 L0E0D4

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-DISC0 L0E0D5

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-DISC0 L0E0D6

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10