Usage of the 'can-see' reward handler with 20k epochs.
Small epsilon
Same as 'q-see-1' but smaller epsilon
| | L0 | L1 | L2 | L3 | L4 |
|---|---|---|---|---|---|
| learning rate | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
| | E0 | E1 | E2 | E3 | E1 |
|---|---|---|---|---|---|
| epsilon | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
| | D0 | D1 | D2 | D3 | D4 |
|---|---|---|---|---|---|
| discount | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 |
| | M0 |
|---|---|
| mapping | non-linear-3 |
| | R0 |
|---|---|
| reward handler | can-see |