
Learning performance
Did not solve the environment. Best 100-episode average reward was 108.69 ± 0.49. (CartPole-v0 is considered "solved" when the agent obtains an average reward of at least 195.0 over 100 consecutive episodes.)
Did not solve the environment. Best 100-episode average reward was 108.69 ± 0.49. (CartPole-v0 is considered "solved" when the agent obtains an average reward of at least 195.0 over 100 consecutive episodes.)