First published: 2024/12/21 (just now) Abstract: We present ComSum, a data set of 7 million commit messages for text
summarization. When documenting commits, software code changes, both a message
and its summary are posted. We gather and filter those to curate developers'
work summarization data set. Along with its growing size, practicality and
challenging language domain, the data set benefits from the living field of
empirical software engineering. As commits follow a typology, we propose to not
only evaluate outputs by Rouge, but by their meaning preservation.