Summary by Udibr 7 years ago
I love the format of this summary. Thanks! The historical averaging idea is interesting. This is basically just a momentum update rule right?

that's what I understood (dont have first hand experience)

In minibatch discrimination, we have these $M$ matrices by multiplying with the $T$ tensor. What is the $T$ tensor? In the code it looks like you initialise it like a weight matrix, which means you learn it?

Your comment:

ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: