The goal is to compress a neural network based on figuring out the most significant neurons. They sample from Determinantal Point Process (DPP) in order to find set of neurons that have the most dissimilar activations and then project remaining neurons to them in order to reduce number of neurons overall.
DPPs compute the probability of volume of dissimilarity over volume of all neurons:
$$P(\text{subset } Y) = \frac{det(L_Y)}{det(L+I)}$$
More dissimilarity means higher probability. A simple sample of the neurons outputs are taken given the training set.