[link]
Summary by siegfried gessulat 8 years ago
To identify a peptide, an experimental mass spectrum of the peptide is matched against a *protein sequence database*. More specifically, the spectrum is compared against artificially generated *hypothetical spectra* that are based on the databases collection of known possible peptides. PepHMM is a scoring function to compare an experimental spectrum with a hypothetical spectrum. It estimates the likelihood that the experimental spectrum comes from the same peptide that was used to generate the hypothetical spectrum. Only $b$, $y$, $b-H_2O$, $y-H_2O$, $a$, $b^{2+}$, and $y^{2+}$ ions are considered for the comparison.
PepHMM distinguishes
- **matches**: experimental peaks that correspond to hypothetical ions
- **missing**: meaning hypothetical ions that have no corresponding experimental peaks
- noisy peaks: meaning peaks that do not correspond to any hypothetical ion.
Matches and missing are modelled by a Hidden Markov Models (HMM). For one specific fragmentation, a hidden state represents which of the ions have been observed and which are missing. For example, for the first possible fragmentation, the $b_1$ ion is not observed, but the corresponding $y_n$ ion is. PepHMM only considers exactly five of all possible fragmentations: the first two, the middle and the last two. Parameters of the HMM include the parameters for the assumed distributions (exponential for peak intensities and normal for match tolerance), respectively for each ion-type.
The complete database matching has the following steps.
1. The search space is limited by the experimental spectrum's precursor mass. This limits the number of potential peptides.
2. For each potential peptide a hypothetical spectrum is generated.
3. For each hypothetical spectrum, a **probabilistic score** is calculated with PepHMM as well as a **Z-score** (by simulating 500 peptides with a similar precursor) and an **E-score**, which is a ranks the peptides by their Z-score.
PepHMM's parameters are trained by Expectation Maximization (EM). The matching results are compared to MASCOT. PepHMM outperforms MASCOT in accuracy in two different tests. Furthermore the number of predicted peptide sequences is compared between PepHMM, MASCOT, and SEQUEST. PepHMM, has the most predictions, but there is also a big overlap in predictions between the three compared.
more
less