QRW104 Cross validation reward handler. Based on QEDC2 2k

Find the optimal reward handler
epochs: 2000 The other values are based on the results of QWDC2

training.parallel.ParallelConfig.q-rw-4

Cross validation for reward handler

L0 L1 L2 L3
learning rate 0.8 0.8 0.8 0.8
E0
epsilon 0.08
ED0
epsilon decay decay-exp-1000
D0
discount 0.25
M0
mapping non-linear-3
R0 R1 R2 R3
reward handler continuous-consider-all reduced-push-reward can-see can-see
F0
fetch mode eager

L0 E0 ED0 D0 M0 R0 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E0 ED0 D0 M0 R1 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E0 ED0 D0 M0 R2 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L0 E0 ED0 D0 M0 R3 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 ED0 D0 M0 R0 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 ED0 D0 M0 R1 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 ED0 D0 M0 R2 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L1 E0 ED0 D0 M0 R3 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 ED0 D0 M0 R0 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 ED0 D0 M0 R1 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 ED0 D0 M0 R2 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L2 E0 ED0 D0 M0 R3 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L3 E0 ED0 D0 M0 R0 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L3 E0 ED0 D0 M0 R1 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L3 E0 ED0 D0 M0 R2 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11
L3 E0 ED0 D0 M0 R3 F0

q-values
video 0 video 1 video 2 video 3
video 4 video 5 video 6 video 7
video 8 video 9 video 10 video 11