QEDC1 Cross validation learning rate, epsilon and discount
Find the optimal learning rate, epsilon and discount with exponential decay for epsilon
epochs: 20000
epsilon decay: exponential half time: 1000 epochs
training.parallel.ParallelConfig.q-edc-1
Cross validation for epsilon start value, learning rate and discount.
|
L0 |
L1 |
L2 |
L3 |
learning rate |
0.01 |
0.05 |
0.1 |
0.5 |
|
E0 |
E1 |
E2 |
E3 |
epsilon |
0.01 |
0.05 |
0.1 |
0.5 |
|
ED0 |
epsilon decay |
decay-exp-1000 |
|
D0 |
D1 |
D2 |
D3 |
discount |
0.2 |
0.3 |
0.5 |
0.8 |
|
M0 |
mapping |
non-linear-3 |
|
R0 |
reward handler |
can-see |