Number of epochs: 5000
Rewards are limited by a lower and upper limit. This might cause problems.
Considers all sensors.
Calculates a reward after every step.
Positive rewarded actions.
Negative rewarded actions.
Rewards are calculated after every step and might be different from zero.
L0 | |
learning rate | 0.1 |
E0 | |
epsilon | 0.1 |
D0 | D1 | D2 | D3 | D4 | D5 | D6 | |
discount | 0.6 | 0.65 | 0.7 | 0.75 | 0.8 | 0.85 | 0.9 |