Cross validation over learning rate 10k

Number of epochs: 10000 Q_LR_1

Rewards are limited by a lower and upper limit. This might cause problems.

training.simrunner.RewardHandlerName.consider-all

Considers all sensors.
Calculates a reward after every step.

Positive rewarded actions.

Negative rewarded actions.

Rewards are calculated after every step and might be different from zero.

training.parallel.ParallelConfig.q-lr-1

L0L0L0L0L0
learning rate0.10.30.50.70.9
E0
epsilon0.5
D0
discount0.55

Results for: Q0-LR3 L0E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-LR3 L1E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-LR3 L2E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-LR3 L3E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-LR3 L4E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10