Summary by Udibr 6 years ago
I love the format of this summary. Thanks! The historical averaging idea is interesting. This is basically just a momentum update rule right?

that's what I understood (dont have first hand experience)

In minibatch discrimination, we have these $M$ matrices by multiplying with the $T$ tensor. What is the $T$ tensor? In the code it looks like you initialise it like a weight matrix, which means you learn it?

