This paper is awesome in that it is full of content.
They replace W with its TSVD. When t, the reduced rank, is small, it saves computation time because you multiply smaller matrices twice rather than multiplying bigger matrices once.
In terms of units in hidden layers, they turn n->m into n->t->m
This only works for the forward pass though. If you were to train this, you would only learn a rank t matrix. In which case, there would be no reason to have the t->m layer. Unless you want more nonlinearities, but less rank; haven't seen that before.