Goal: Find better values for D based on the results of QLOW04
| L0 | L1 | L2 |
---|---|---|---|
learning rate | 0.01 | 0.005 | 0.001 |
| E0 | E1 | E2 |
---|---|---|---|
epsilon | 0.01 | 0.005 | 0.001 |
| D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 |
---|---|---|---|---|---|---|---|---|---|
discount | 0.3 | 0.3 | 0.3 | 0.5 | 0.5 | 0.5 | 0.8 | 0.8 | 0.8 |
| M0 |
---|---|
mapping | non-linear-3 |
| R0 |
---|---|
reward handler | speed-bonus |