[link]
Summary by elbaro 4 years ago
## Task
Add '**rejection**' output to an existing classification model with softmax layer.
## Method
1. Choose some threshold $\delta$ and temperature $T$
2. Add a perturbation to the input x (eq 2),
let $\tilde x = x - \epsilon \text{sign}(-\nabla_x \log S_{\hat y}(x;T))$
3. If $p(\tilde x;T)\le \delta$, rejects
4. If not, return the output of the original classifier
$p(\tilde x;T)$ is the max prob with temperature scailing for input $\tilde x$
$\delta$ and $T$ are manually chosen.

more
less