This paper discusses the universal approximation theorem which states: There is a single hidden layer feedforward network that approximates any measurable function to any desired degree of accuracy.
For any unknown function $f(x)$ there exists a single hidden layer feedforward network $F(x)$ such that $ | F( x ) - f ( x ) | < \epsilon$ for some number of hidden units.
$F(x)$ takes the following form where $h$ is some nonlinear activation function (relu, tanh, sigmoid). $w_i$ is a vector and $b_i$ and $v_i$ are scalars.
$$ F( x ) =
\sum_{i=1}^{N} v_i h( w_i x + b_i)$$
Resources:
http://deeplearning.cs.cmu.edu/notes/Sonia_Hornik.pdf
http://neuralnetworksanddeeplearning.com/chap4.html