Stochastic Backpropagation through Mixture Density Distributions
Alex Graves
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.NE


Summary by Hugo Larochelle 4 years ago
Thanks for the summary. Do you know why Equation 5 (`\frac{partial F_d (x_d|x_{<d}) }{\partial\theta} = ... = 0`) which exploits the Leibniz rule is set to zero? At first glance if the assumption is that the PDF `f_d` depends on `\theta`, then so should the CDF `F`, and so wouldn't it generally have a non-zero partial derivative?

Good question! It's because $\hat{x}_d$ in Equation 5 was sampled as $\hat{x}_d = F^{-1}(u_d|{\bf x}_{<d})$ where $u_d\sim U(0,1)$. So $F_d(\hat{x}_d|{\bf x}_{<d}) = u_d$ and $u_d$ was sampled independently of $\theta$ (it's a uniform sample). So the derivative of $F_d(\hat{x}_d|{\bf x}_{<d})$ (i.e. of $u_d$) is 0. Hope this helps!

Perfect, thank you!

Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: