Cross validation over epsilon 10k

Number of epochs: 10000 Q_EPS_1

training.simrunner.RewardHandlerName.consider-all

Considers all sensors.
Calculates a reward after every step.

Positive rewarded actions.

Negative rewarded actions.

Rewards are calculated after every step and might be different from zero.

training.parallel.ParallelConfig.q-eps-1

L0
learning rate0.5
E0E0E0E0E0
epsilon0.10.30.50.70.9
D0
discount0.55

Results for: Q0-EPS3 L0E0D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6
Results for: Q0-EPS3 L0E1D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS3 L0E2D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS3 L0E3D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10
Results for: Q0-EPS3 L0E4D0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10