* Semi-supervised method
* There is a teacher net and a student net, with identical architecture.
* The teacher makes predictions on unlabeled data, which are used as ground-truth for training the student net.
* After each gradient descent update on the student, the teacher's weights are updated so that it becomes an exponential moving average of the weights of the student at previous timesteps. It's called a "mean teacher" because of this moving average.