This paper lays a framework for pedagogically inspired reinforcement learning that can be used to train both students and agents. The paper also draws analogies between theories of language acquisition and those of reinforcement learning. The paper cites work from Elman (1993), Bengio (2009) to show the motivation of curriculum learning and then demonstrates how it can be applied to vocabulary acquisition. There is an interesting reference of zone of proximal development (ZPD) that has not previously been referenced in the context of curriculum learning. ZPD formalises the concept of what we know, what we can learn with some help, and what is beyond our understanding. This is a well motivated concept and can be applied in a way to train agents for any particular task.
One of the main reasons for success in AlphaZero by Deepmind was having the optimal opposition, that was neither too strong nor too weak. This allowed the system to use self-play in order to improve learning. There are many parallels between ZPD and optimal opposition such that ZPD can determines what the optimal strength of an opponent should be in order to encourage transfer learning. This idea can also be used to control the discriminator in a GAN. An interesting extension would be to infer the ZPD using a bayesian framework.