Goal: Create a baseline for cross validation of see rewards Based on the values from QRW07
| | L0 | L1 | L2 | L3 | L4 |
|---|---|---|---|---|---|
| learning rate | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| | E0 | E1 | E2 | E3 | E1 |
|---|---|---|---|---|---|
| epsilon | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |
| | D0 | D1 | D2 | D3 | D4 |
|---|---|---|---|---|---|
| discount | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |
| | M0 |
|---|---|
| mapping | non-linear-3 |
| | R0 |
|---|---|
| reward handler | speed-bonus |