Number of epochs: 10000 Q_EPS_1
Considers all sensors.
Calculates a reward after every step.
Positive rewarded actions.
Negative rewarded actions.
Rewards are calculated after every step and might be different from zero.
L0 | |
learning rate | 0.5 |
E0 | E0 | E0 | E0 | E0 | |
epsilon | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 |
D0 | |
discount | 0.55 |