This paper proposes a method to obtain a non-vacuous bound on generalization error by optimizing the PAC-Bayes bound directly. The interesting part is that the authors leverage the black magic of neural net itself to bound the neural net. In order to find the optimal Q, the authors' loss function is an empirical err term plus the $KL(Q|P)$, where they choose the prior $P$ to be $N(0, \lambda I)$, and they also provide justification for choosing the right $\lambda$. Overally, this objective is similar to the variational inference in the Bayesian neural net, and the author is able to obtain a test error bound of $17\%$ on MNIST, while the tradition bounds will be mostly meaningless.