This paper presents a theoretical analysis and empirical validation of a novel view of feature extraction systems based on the idea of Nystrom sampling for kernel methods. The main idea is to analyze the kernel matrix for a feature space defined by an off-the-shelf feature extraction system. In such a system, a bound is identified for the error in representing the "full" dictionary composed of all data points by a Nystrom approximated version (i.e., represented by subsampling the data points randomly). The bound is then extended to show that the approximate kernel matrix obtained using the Nystrom-sampled dictionary is close to the true kernel matrix, and it is argued that the quality of the approximation is a reasonable proxy for the classification error we can expect after training. It is shown that this approximation model qualitatively predicts the monotonic rise in accuracy of feature extraction with larger dictionaries and saturation of performance in experiments.