n-Chain environment

This game presents moves along a linear chain of states, with two actions:
  1. forward, which moves along the chain but returns no reward
  2. backward, which returns to the beginning and has a small reward

The end of the chain, however, presents a large reward, and by moving 'forward' at the end of the chain this large reward can be repeated.

At each action, there is a small probability that the agent 'slips' and the opposite transition is instead taken.

The observed state is the current state in the chain (0 to n-1).

This environment is described in section 6.1 of: A Bayesian Framework for Reinforcement Learning by Malcolm Strens (2000)
RandomAgent on NChain-v0