Women also Snowboard: Overcoming Bias in Captioning Models
Kaylee Burns
and
Lisa Anne Hendricks
and
Kate Saenko
and
Trevor Darrell
and
Anna Rohrbach
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.CV
First published: 2018/03/26 (6 years ago) Abstract: Most machine learning methods are known to capture and exploit biases of the
training data. While some biases are beneficial for learning, others are
harmful. Specifically, image captioning models tend to exaggerate biases
present in training data (e.g., if a word is present in 60% of training
sentences, it might be predicted in 70% of sentences at test time). This can
lead to incorrect captions in domains where unbiased captions are desired, or
required, due to over-reliance on the learned prior and image context. In this
work we investigate generation of gender-specific caption words (e.g. man,
woman) based on the person's appearance or the image context. We introduce a
new Equalizer model that ensures equal gender probability when gender evidence
is occluded in a scene and confident predictions when gender evidence is
present. The resulting model is forced to look at a person rather than use
contextual cues to make a gender-specific predictions. The losses that comprise
our model, the Appearance Confusion Loss and the Confident Loss, are general,
and can be added to any description model in order to mitigate impacts of
unwanted bias in a description dataset. Our proposed model has lower error than
prior work when describing images with people and mentioning their gender and
more closely matches the ground truth ratio of sentences including women to
sentences including men. We also show that unlike other approaches, our model
is indeed more often looking at people when predicting their gender.