
Learning performance
Solved after 211 episodes. Best 100-episode average reward was 195.27 ± 1.57. (CartPole-v0 is considered "solved" when the agent obtains an average reward of at least 195.0 over 100 consecutive episodes.)
Algorithm
This evaluation was generated by running episodic_controller.