Learning performance
Solved after 13937 episodes. Best 100-episode average reward was 31.20 ± 0.08. (Copy-v0 is considered "solved" when the agent obtains an average reward of at least 25.0 over 100 consecutive episodes.)
Solved after 13937 episodes. Best 100-episode average reward was 31.20 ± 0.08. (Copy-v0 is considered "solved" when the agent obtains an average reward of at least 25.0 over 100 consecutive episodes.)