Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer
Eleftheria Briakou
and
Sweta Agrawal
and
Joel Tetreault
and
Marine Carpuat
arXiv e-Print archive - 2021 via Local arXiv
Keywords:
cs.CL
First published: 2024/12/22 (just now) Abstract: While the field of style transfer (ST) has been growing rapidly, it has been
hampered by a lack of standardized practices for automatic evaluation. In this
paper, we evaluate leading ST automatic metrics on the oft-researched task of
formality style transfer. Unlike previous evaluations, which focus solely on
English, we expand our focus to Brazilian-Portuguese, French, and Italian,
making this work the first multilingual evaluation of metrics in ST. We outline
best practices for automatic evaluation in (formality) style transfer and
identify several models that correlate well with human judgments and are robust
across languages. We hope that this work will help accelerate development in
ST, where human evaluation is often challenging to collect.