In binary classification task on an imbalanced dataset, we often report *area under the curve* (AUC) of *receiver operating characteristic* (ROC) as the classifier's ability to distinguish two classes.
If there are $k$ errors, accuracy will be the same irrespective of how those $k$ errors are made i.e. misclassification of positive samples or misclassification of negative samples.
AUC-ROC is a metric that treats these misclassifications asymmetrically, making it an appropriate statistic for classification tasks on imbalanced datasets.
However, until this paper, AUC-ROC was hard to quantify and differentiate to gradient-descent over.
This paper approximated AUC-ROC by a Wilcoxon-Mann-Whitney statistic which counts the "number of wins" in all the pairwise comparisons -
$
U = \frac{\sum_{i=1}^{m}\sum_{j=1}^{n}I(x_i, x_j)}{mn},
$
where $m$ is the total number of positive samples, $n$ is the number of negative samples, and $I(x_i, x_j)$ is $1$ if $x_i$ is ranked higher than $x_j$.
Figure 1 in the paper shows the variance of this statistic with an increasing imbalance in the dataset, justifying the close correspondence with AUC-ROC.
Further, to make this metric smooth and differentiable, the step function of pairwise comparison is replaced by sigmoid or hinge functions.
Further extensions are made to apply this to multi-class classification tasks and focus on top-K predictions i.e. optimize lower-left part of AUC.