Number of epochs: 5000
Rewards are limited by a lower and upper limit. This might cause problems.
Considers all sensors.
Calculates a reward after every step.
Positive rewarded actions.
Negative rewarded actions.
Rewards are calculated after every step and might be different from zero.
L0 | L1 | L2 | L3 | L4 | L5 | L6 | L7 | L8 | L9 | |
learning rate | 0.0001 | 0.0005 | 0.001 | 0.005 | 0.01 | 0.05 | 0.1 | 0.2 | 0.3 | 0.5 |
E0 | |
epsilon | 0.1 |
D0 | |
discount | 0.75 |