The goal is to add two multi-digit sequences, provided on an input grid. The sequences are provided in two adjacent rows, with the right edges aligned. The initial position of the read head is the last digit of the top number (i.e. upper-right corner). The model has to: (i) memorize an addition table for pairs of digits; (ii) learn how to move over the input grid and (iii) discover the concept of a carry.
ReversedAddition-v0 defines "solving" as getting average reward of 25.0 over 100 consecutive trials.