Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification

被引:358
作者
Allen, Felicity [1 ]
Greiner, Russ [1 ]
Wishart, David [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Tandem mass spectrometry; MS/MS; Metabolite identification; Machine learning; COMPUTATIONAL MASS-SPECTROMETRY; DISSOCIATION; METABOLOMICS; SOFTWARE; OPTIMIZATION; PREDICTION; PATHWAYS; DATABASE; METLIN; IONS;
D O I
10.1007/s11306-014-0676-4
中图分类号
R5 [内科学];
学科分类号
100201 [内科学];
摘要
Electrospray tandem mass spectrometry (ESI-MS/MS) is commonly used in high throughput metabolomics. One of the key obstacles to the effective use of this technology is the difficulty in interpreting measured spectra to accurately and efficiently identify metabolites. Traditional methods for automated metabolite identification compare the target MS or MS/MS spectrum to the spectra in a reference database, ranking candidates based on the closeness of the match. However the limited coverage of available databases has led to an interest in computational methods for predicting reference MS/MS spectra from chemical structures. This work proposes a probabilistic generative model for the MS/MS fragmentation process, which we call competitive fragmentation modeling (CFM), and a machine learning approach for learning parameters for this model from MS/MS data. We show that CFM can be used in both a MS/MS spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target MS/MS spectrum). In the MS/MS spectrum prediction task, CFM shows significantly improved performance when compared to a full enumeration of all peaks corresponding to substructures of the molecule. In the metabolite identification task, CFM obtains substantially better rankings for the correct candidate than existing methods (MetFrag and FingerID) on tripeptide and metabolite data, when querying PubChem or KEGG for candidate structures of similar mass.
引用
收藏
页码:98 / 110
页数:13
相关论文
共 43 条
[1]
[Anonymous], ANN REPORTS COMPUTAT
[2]
Towards de novo identification of metabolites by analyzing tandem mass spectra [J].
Boecker, Sebastian ;
Rasche, Florian .
BIOINFORMATICS, 2008, 24 (16) :I49-I55
[3]
CAPPE O, 2005, SPR S STAT, P1
[4]
de Hoffman E., 2007, Mass spectrometry: Principles and applications, V3rd
[5]
On a least squares adjustment of a sampled frequency table when the expected marginal totals are known [J].
Deming, WE ;
Stephan, FF .
ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 :427-444
[6]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]
Metabolomics: Current analytical platforms and methodologies [J].
Dunn, WB ;
Ellis, DI .
TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2005, 24 (04) :285-294
[8]
AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[9]
Metabolomics - the link between genotypes and phenotypes [J].
Fiehn, O .
PLANT MOLECULAR BIOLOGY, 2002, 48 (1-2) :155-171
[10]
A predictive science approach to aid understanding of electrospray ionisation tandem mass spectrometric fragmentation pathways of small molecules using density functional calculations [J].
Galezowska, Angelika ;
Harrison, Mark W. ;
Herniman, Julie M. ;
Skylaris, Chris-Kriton ;
Langley, G. John .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2013, 27 (09) :964-970