On Calibration of Modern Neural Networks
Chuan Guo
and
Geoff Pleiss
and
Yu Sun
and
Kilian Q. Weinberger
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG
First published: 2017/06/14 (7 years ago) Abstract: Confidence calibration -- the problem of predicting probability estimates
representative of the true correctness likelihood -- is important for
classification models in many applications. We discover that modern neural
networks, unlike those from a decade ago, are poorly calibrated. Through
extensive experiments, we observe that depth, width, weight decay, and Batch
Normalization are important factors influencing calibration. We evaluate the
performance of various post-processing calibration methods on state-of-the-art
architectures with image and document classification datasets. Our analysis and
experiments not only offer insights into neural network learning, but also
provide a simple and straightforward recipe for practical settings: on most
datasets, temperature scaling -- a single-parameter variant of Platt Scaling --
is surprisingly effective at calibrating predictions.