Problem QRW7: Results still too unstable.
Goal: Find better values for L and E based on the results of QRW07
| | L0 | L1 |
|---|---|---|
| learning rate | 0.001 | 0.0001 |
| | E0 | E1 |
|---|---|---|
| epsilon | 0.001 | 0.0001 |
| | D0 |
|---|---|
| discount | 0.3 |
| | M0 |
|---|---|
| mapping | non-linear-3 |
| | R0 | R1 | R2 |
|---|---|---|---|
| reward handler | speed-bonus | speed-bonus | speed-bonus |