The authors investigate the benefit of different task combinations when performing multi-task learning.
https://i.imgur.com/VmD2ioS.png
They experiment with all possible pairs of 10 sequence labeling datasets, switching between the datasets during training. They find that multi-task learning helps more when the main task quickly plateaus while the auxiliary task does not, likely helping the model out of local minima.
There does not seem to be any auxiliary task that would help on all main tasks, but chunking and semantic tagging seem to perform best.