First published: 2017/11/06 (7 years ago) Abstract: Many deployed learned models are black boxes: given input, returns output.
Internal information about the model, such as the architecture, optimisation
procedure, or training data, is not disclosed explicitly as it might contain
proprietary information or make the system more vulnerable. This work shows
that such attributes of neural networks can be exposed from a sequence of
queries. This has multiple implications. On the one hand, our work exposes the
vulnerability of black-box neural networks to different types of attacks -- we
show that the revealed internal information helps generate more effective
adversarial examples against the black box model. On the other hand, this
technique can be used for better protection of private content from automatic
recognition models using adversarial examples. Our paper suggests that it is
actually hard to draw a line between white box and black box models.
Oh et al. propose two different approaches for whitening black box neural networks, i.e. predicting details of their internals such as architecture or training procedure. In particular, they consider attributes regarding architecture (activation function, dropout, max pooling, kernel size of convolutional layers, number of convolutionaly/fully connected layers etc.), attributes concerning optimization (batch size and optimization algorithm) and attributes regarding the data (data split and size). In order to create a dataset of models, they trained roughly 11k models on MNIST; they ensured that these models have at least 98% accuracy on the validation set and they also consider ensembles.
For predicting model attributes, they propose two models, called kennen-o and kennen-i, see Figure 1. Kennen-o takes as input a set of $100$ predictions of the models (i.e. final probability distributions) and tries to directly learn the attributes using a MLP of two fully connected layers. Kennen-i instead crafts a single input which allows to reason about a specific model attribute. An example for kennen-i is shown in Figure 2. In experiments, they demonstrate that both models are able to predict model attributes significantly better than chance. For details, I refer to the paper.
https://i.imgur.com/YbFuniu.png
Figure 1: Illustration of the two proposed approaches, kennen-o (top) and kennen-i (bottom).
https://i.imgur.com/ZXj22zG.png
Figure 2: Illustration of the images created by kennen-i to classify different attributes. See the paper for details.
Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).