The agent plays 0-to-36 Roulette in a modified casino setting. For each spin, the agent bets on a number. The agent receives a positive reward iff the rolled number is not zero and its parity matches the agent's bet. Additionally, the agent can choose to walk away from the table, ending the episode.
The modification from classical Roulette is to reduce variance -- agents can learn more quickly that the reward from betting on any number is uniformly distributed. Additionally, rational agents should learn that the best long-term move is not to play at all, but to walk away from the table.