The [R-CNN](http://arxiv.org/abs/1311.2524) paper presents a method based on convolutional neural networks (CNNs) for object detection. It does so by region proposals (hence the "R"). The key insight was to train CNNs on classification tasks and use the learned features for the region proposals. The do *not* use a sliding window approach such as Overfeat. They create around 2000 category-independent region proposals. For each proposal, they crop the part of that image. Then they resize the cropped part to fit into the CNN and classify it.
Notable follow-ups are:
* [Fast R-CNN](http://www.shortscience.org/paper?bibtexKey=conf/iccv/Girshick15)
* [Faster R-CNNs](http://www.shortscience.org/paper?bibtexKey=conf/nips/RenHGS15)