Uasge of the 'can-see' reward handler with 20k epochs.
Small epsilon
| L0 | L1 | L2 | L3 | L4 |
---|---|---|---|---|---|
learning rate | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| E0 | E1 | E2 | E3 | E1 |
---|---|---|---|---|---|
epsilon | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
| D0 | D1 | D2 | D3 | D4 |
---|---|---|---|---|---|
discount | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |
| M0 |
---|---|
mapping | non-linear-3 |
| R0 |
---|---|
reward handler | can-see |