First published: 2017/11/29 (6 years ago) Abstract: Deep convolutional networks have become a popular tool for image generation
and restoration. Generally, their excellent performance is imputed to their
ability to learn realistic image priors from a large number of example images.
In this paper, we show that, on the contrary, the structure of a generator
network is sufficient to capture a great deal of low-level image statistics
prior to any learning. In order to do so, we show that a randomly-initialized
neural network can be used as a handcrafted prior with excellent results in
standard inverse problems such as denoising, super-resolution, and inpainting.
Furthermore, the same prior can be used to invert deep neural representations
to diagnose them, and to restore images based on flash-no flash input pairs.
Apart from its diverse applications, our approach highlights the inductive
bias captured by standard generator network architectures. It also bridges the
gap between two very popular families of image restoration methods:
learning-based methods using deep convolutional networks and learning-free
methods based on handcrafted image priors such as self-similarity. Code and
supplementary material are available at
https://dmitryulyanov.github.io/deep_image_prior .

Ulyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and super-resolution. In particualr, the standard formulation of such tasks, i.e.
$x^\ast = \arg\min_x E(x, x_0) + R(x)$
where $x_0$ is the input image and $E$ a task-dependent data term, is rephrased as follows:
$\theta^\ast = \arg\min_\theta E(f_\theta(z); x_0)$ and $x^\ast = f_{\theta^\ast}(z)$
for a fixed but random $z$. Here, the regularizer $R$ is essentially replaced by an untrained neural network $f_\theta$ – usually in the form of a convolutional encoder. The authors argue that the regualizer is effectively $R(x) = 0$ if the image can be generated by the encoder from the fixed code $z$ and $R(x) = \infty$ if not. However, this argument does not necessarily provide any insights on why this approach works (as demonstrated in the paper).
A main question addressed in the paper is why the network $f_\theta$ can be used as a prior – regarding the assumption that high-capacity networks can essentially fit any image (including random noise). In my opinion, the authors do not give a convincing answer to this question. Essentially, they argue that random noise is just harder to fit (i.e. it takes longer). Therefore, limiting the number of iterations is enough as regularization. Personally I would argue that this observation is mainly due to prior knowledge put into the encoder architecture and the idea that natural images (or any images with some structure) are easily embedded into low-dimensional latent spaced compared to fully I.i.d. random noise.
They provide experiments on a range of tasks including denoising, image inpainting, super-resolution and neural network “inversion”. Figure 1 shows some results for image inpainting that I found quite convincing. For the remaining experiments I refer to the paper.
https://i.imgur.com/BVQsaup.png
Figure 1: Qualitative results for image inpainting.
Also see this summary at [davidstutz.de](https://davidstutz.de/category/reading/).