First published: 2022/05/28 (just now) Abstract: Grammatical error correction (GEC) is the task of detecting and correcting
errors in a written text. The idea of combining multiple system outputs has
been successfully used in GEC. To achieve successful system combination,
multiple component systems need to produce corrected sentences that are both
diverse and of comparable quality. However, most existing state-of-the-art GEC
approaches are based on similar sequence-to-sequence neural networks, so the
gains are limited from combining the outputs of component systems similar to
one another. In this paper, we present Diversity-Driven Combination (DDC) for
GEC, a system combination strategy that encourages diversity among component
systems. We evaluate our system combination strategy on the CoNLL-2014 shared
task and the BEA-2019 shared task. On both benchmarks, DDC achieves significant
performance gain with a small number of training examples and outperforms the
component systems by a large margin. Our source code is available at
Average ensembling is practical - but naive.
Combine considering each network's strengths, much better!
Moreover, let's make the networks diverse so they will have different strengths.
Wenjuan Han & Hwee Tou Ng (no twitters?)
The basic idea is quite simple:
Given some models, why would we want the average? We want to rely on each one(or group) when it is more likely to be the correct one.
This was actually introduced in our previous work (as admitted by the authors) in
The paper's addition:
1. Given a set of black-box models we may train at least one of them to be different from the rest with RL.
2. we can use more sophisticated NNs to combine the outputs
3. we can ignore domain knowledge for the combination (I am not sure this is a bonus)
Results are very strong. Especially nice is that they show that the diversity training indeed helps
The comparisons are always to SoTA, this is meaningless. The authors propose different parts (the diversity, the combination and the combined models).
It is unclear whether ensembling after the diversity would be preferable over their's or not.
Similarly, they compare to Kantor et al., but Kantor provided a combination method, why not compare on the same models, or combine with Kantor's method the models after the diversity training?
To conclude, I really like the direction, and ensembling is a very practical tool that for some reason was not improved in a long time.