Problem QRW7: Results still too unstable.
Goal: Find better values for L and E based on the results of QRW07
| L0 | L1 |
---|---|---|
learning rate | 0.001 | 0.0001 |
| E0 | E1 |
---|---|---|
epsilon | 0.001 | 0.0001 |
| D0 |
---|---|
discount | 0.3 |
| M0 |
---|---|
mapping | non-linear-3 |
| R0 | R1 | R2 |
---|---|---|---|
reward handler | speed-bonus | speed-bonus | speed-bonus |