The prediction gradient is just $\frac{\partial \mathbf{y}}{\partial w}$ where $\mathbf{y}$ is the output before the loss function.