Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Duchi, John C.
and
Hazan, Elad
and
Singer, Yoram
Conference on Learning Theory - 2010 via Local Bibsonomy
Keywords:
dblp
This is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:
```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```