siegfried gessulat's profile - ShortScience.org

dx.doi.org
sci-hub
scholar.google.com

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
Jimmy K. Eng and Ashley L. McCormack and John R. Yates
Journal of the American Society for Mass Spectrometry - 1994 via Local CrossRef
Keywords:

[link] Summary by siegfried gessulat 8 years ago

In proteomics, a popular method to identify peptides is mass spectrometry. An experimental tandem mass spectrum consists of mass peaks that stem from fragmenting a peptide to fragment ions. 
To identify the peptide, that produced the experimental spectrum, all peptides that could produce the seen fragment ions must be analysed. Unfortunately, the space of possible peptides to search against is large.  The paper simplifies this problem by searching against a peptide database.

The approach describes a 4-step method:

- Data reduction: only the 200 most abundant peaks of the experimental spectrum are kept
- Search: peptides with a mass similar to the experimental spectrum's precursor are selected
- Scoring: the score for each selected peptide is based on the number of predicted fragment ions seen in the experimental spectrum.
- Cross-correlation: For the top 500 scoring peptides, theoretical spectra is constructed. These spectra are are evaluated by cross-correlation with the experimental spectrum. The top scoring peptide is considered the identified peptide.

Since a database of known peptides is needed for this approach, it can not be used for de-novo peptide identification, meaning identifying peptides for species where no sequenced genome or proteome is available.

dx.doi.org
sci-hub
scholar.google.com

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search
Yunhu Wan and Austin Yang and Ting Chen
Analytical Chemistry - 2006 via Local CrossRef
Keywords:

[link] Summary by siegfried gessulat 9 years ago

To identify a peptide, an experimental mass spectrum of the peptide is matched against a *protein sequence database*. More specifically, the spectrum is compared against artificially generated *hypothetical spectra* that are based on the databases collection of known possible peptides. PepHMM is a scoring function to compare an experimental spectrum with a hypothetical spectrum. It estimates the likelihood that the experimental spectrum comes from the same peptide that was used to generate the hypothetical spectrum. Only $b$, $y$, $b-H_2O$, $y-H_2O$, $a$, $b^{2+}$, and $y^{2+}$ ions are considered for the comparison.

PepHMM distinguishes
- **matches**: experimental peaks that correspond to hypothetical ions
- **missing**: meaning hypothetical ions that have no corresponding experimental peaks
- noisy peaks: meaning peaks that do not correspond to any hypothetical ion.

Matches and missing are modelled by a Hidden Markov Models (HMM). For one specific fragmentation, a hidden state represents which of the ions have been observed and which are missing. For example, for the first possible fragmentation, the $b_1$ ion is not observed, but the corresponding $y_n$ ion is. PepHMM only considers exactly five of all possible fragmentations: the first two, the middle and the last two. Parameters of the HMM include the parameters for the assumed distributions (exponential for peak intensities and normal for match tolerance), respectively for each ion-type.

The complete database matching has the following steps.
1. The search space is limited by the experimental spectrum's precursor mass. This limits the number of potential peptides.
2. For each potential peptide a hypothetical spectrum is generated.
3. For each hypothetical spectrum, a **probabilistic score** is calculated with PepHMM as well as a **Z-score** (by simulating 500 peptides with a similar precursor) and an **E-score**, which is a ranks the peptides by their Z-score.

PepHMM's parameters are trained by Expectation Maximization (EM). The matching results are compared to MASCOT. PepHMM outperforms MASCOT in accuracy in two different tests. Furthermore the number of predicted peptide sequences is compared between PepHMM, MASCOT, and SEQUEST. PepHMM, has the most predictions, but there is also a big overlap in predictions between the three compared.

siegfried gessulat

sciscore: 1.222