[link]
Summary by Joseph Paul Cohen 8 years ago
This is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:
```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```
more
less