Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned on ShortScience.org

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita and David Talbot and Fedor Moiseev and Rico Sennrich and Ivan Titov
arXiv e-Print archive - 2019 via Local arXiv
Keywords: cs.CL
more

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private