Big Bird: Transformers for Longer Sequences
Zaheer, Manzil
and
Guruganesh, Guru
and
Dubey, Avinava
and
Ainslie, Joshua
and
Alberti, Chris
and
Ontañón, Santiago
and
Pham, Philip
and
Ravula, Anirudh
and
Wang, Qifan
and
Yang, Li
and
Ahmed, Amr
arXiv e-Print archive - 2020 via Local Bibsonomy
Keywords:
transfer-learning, pre-trained, transformer, bert