VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari and Liangzhe Yuan and Rui Qian and Wei-Hong Chuang and Shih-Fu Chang and Yin Cui and Boqing Gong
arXiv e-Print archive - 2021 via Local arXiv
Keywords: cs.CV, cs.AI, cs.LG, cs.MM, eess.IV
more

Summaries/Notes 1

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private

ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: