Best 100-episode average reward was 198.68 ± 0.76. (OffSwitchCartpoleProb-v0 does not have a specified reward threshold at which it's considered solved.)