We present a neural virtual machine that can be
trained to perform algorithmic tasks. Rather than combining
a neural controller with non-neural memory storage as has
been done in the past, this architecture is purely neural and
emulates tape-based memory via fast associative weights (onestep
learning). Here we formally define the architecture, and
then extend the system to learn programs using recurrent policy
gradient reinforcement learning based on examples of program
inputs labeled with corresponding output targets, which are
compared against actual output to generate a sparse reward
signal. We describe the policy gradient training procedure used,
and report its empirical performance on a number of smallscale
list processing tasks, such as finding the maximum list
element, filtering out certain elements, and reversing the order
of the elements. These results show that program induction via
reinforcement learning is possible using sparse rewards and solely
neural computations.
Revised: November 4, 2020 |
Published: September 28, 2020
Citation
Katz G.E., K. Gupta, and J.A. Reggia. 2020.Reinforcement-based Program Induction in a Neural Virtual Machine. In International Joint Conference on Neural Networks (IJCNN 2020), July 19-24, 2020, Glasgow, UK, 1-8. Piscataway, New Jersey:IEEE.PNNL-SA-155294.doi:10.1109/IJCNN48605.2020.9207671