Goal: Find better values for D based on the results of QLOW04
| | L0 | L1 | L2 |
|---|---|---|---|
| learning rate | 0.01 | 0.005 | 0.001 |
| | E0 | E1 | E2 |
|---|---|---|---|
| epsilon | 0.01 | 0.005 | 0.001 |
| | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 |
|---|---|---|---|---|---|---|---|---|---|
| discount | 0.3 | 0.3 | 0.3 | 0.5 | 0.5 | 0.5 | 0.8 | 0.8 | 0.8 |
| | M0 |
|---|---|
| mapping | non-linear-3 |
| | R0 |
|---|---|
| reward handler | speed-bonus |