Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita and David Talbot and Fedor Moiseev and Rico Sennrich and Ivan Titov
arXiv e-Print archive - 2019 via Local arXiv
Keywords: cs.CL

