Goal: Create a baseline for cross validation of see rewards Based on the values from QRW07
| L0 | L1 | L2 | L3 | L4 |
---|---|---|---|---|---|
learning rate | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| E0 | E1 | E2 | E3 | E1 |
---|---|---|---|---|---|
epsilon | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |
| D0 | D1 | D2 | D3 | D4 |
---|---|---|---|---|---|
discount | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |
| M0 |
---|---|
mapping | non-linear-3 |
| R0 |
---|---|
reward handler | speed-bonus |