Dash et al. present a reasonably recent survey on radial basis function (RBF) networks. RBF networks can be understood as two-layer perceptrons, consisting of an input layer, a hidden layer and an output layer. Instead of using a linear operation for computing the hidden layers, RBF kernels are used; as simple example the hidden units are computed as
$h_i = \phi_i(x) = \exp\left(-\frac{\|x - \mu_i\|^2}{2\sigma_i^2}\right)$
where $\mu_i$ and $\sigma_i^2$ are parameters of the kernel. In a clustering interpretation, the $\mu_i$’s correspond to the kernel’s center and the $\sigma_i^2$’s correspond to the kernels bandwidth. The hidden units are then summed with weights $w_i$; for one output $y \in \mathbb{R}$ this can be written as
$y_i = \sum_i w_i h_i$.
Originally, RBF networks were trained in a “clustering”-fashion in order to find the centers $\mu_i$; the bandwidths are often treated as hyper-parameters. Dash et al. show several alternative approaches based on clustering or orthogonal least squares; I refer to the paper for details.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).