
Learning performance
Solved after 5910 episodes. Best 100-episode average reward was 209.13 ± 6.00. (LunarLander-v2 is considered "solved" when the agent obtains an average reward of at least 200 over 100 consecutive episodes.)
Solved after 5910 episodes. Best 100-episode average reward was 209.13 ± 6.00. (LunarLander-v2 is considered "solved" when the agent obtains an average reward of at least 200 over 100 consecutive episodes.)