The goal is to reverse a sequence of symbols on the input tape. We provide a special character \(r\) to indicate the end of the sequence. The model must learn to move right multiple times until it hits the \(r\) symbol, then move to the left, copying the symbols to the output tape.
Reverse-v0 defines "solving" as getting average reward of 25.0 over 100 consecutive trials.
|Algorithm||Episodes before solve||Submitted|
|colinmorris's algorithm writeup||1247.0|
|wojzaremba's algorithm writeup||31826.0|
|blole's algorithm writeup||N/A|
|gdisneyleugers's algorithm writeup||N/A|
|gdb's algorithm writeup||N/A|