The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets
Carlini, Nicholas
and
Liu, Chang
and
Kos, Jernej
and
Erlingsson, Úlfar
and
Song, Dawn
arXiv e-Print archive - 2018 via Local Bibsonomy
Keywords:
dblp
Carlini et al. propose several attacks to extract secrets form trained black-box models. Additionally, they show that state-of-the-art neural networks memorize secrets early during training. Particularly on the Penn treebank, after inserting a secret of specific format, the authors validate that the secret can be identified based on the models output probabilities (i.e., black-box access). Several metrics based on the log-perplexity of the secret show that secrets are memorized early during training and memorization happens for all popular architectures and training strategies; additionally, memorization also works for multiple secrets. Furthermore, the authors propose several attacks to extract secrets, most notably through shortest path search. Here, starting with an empty secret, the characters of the secret are identified sequentially in order to minimize log-perplexity. Using this attack, secrets such as credit card numbers are extractable from popular mail datasets.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).