In proteomics, a popular method to identify peptides is mass spectrometry. An experimental tandem mass spectrum consists of mass peaks that stem from fragmenting a peptide to fragment ions.
To identify the peptide, that produced the experimental spectrum, all peptides that could produce the seen fragment ions must be analysed. Unfortunately, the space of possible peptides to search against is large. The paper simplifies this problem by searching against a peptide database.
The approach describes a 4-step method:
- Data reduction: only the 200 most abundant peaks of the experimental spectrum are kept
- Search: peptides with a mass similar to the experimental spectrum's precursor are selected
- Scoring: the score for each selected peptide is based on the number of predicted fragment ions seen in the experimental spectrum.
- Cross-correlation: For the top 500 scoring peptides, theoretical spectra is constructed. These spectra are are evaluated by cross-correlation with the experimental spectrum. The top scoring peptide is considered the identified peptide.
Since a database of known peptides is needed for this approach, it can not be used for de-novo peptide identification, meaning identifying peptides for species where no sequenced genome or proteome is available.