Towards de novo identification of metabolites by analyzing tandem mass spectra

被引:92
作者
Boecker, Sebastian [1 ,2 ]
Rasche, Florian [1 ]
机构
[1] Univ Jena, Chair Bioinformat, D-07743 Jena, Germany
[2] Jena Ctr Bioinformat, Jena, Germany
关键词
D O I
10.1093/bioinformatics/btn270
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Mass spectrometry is among the most widely used technologies in proteomics and metabolomics. Being a high-throughput method, it produces large amounts of data that necessitates an automated analysis of the spectra. Clearly, database search methods for protein analysis can easily be adopted to analyze metabolite mass spectra. But for metabolites, de novo interpretation of spectra is even more important than for protein data, because metabolite spectra databases cover only a small fraction of naturally occurring metabolites: even the model plant Arabidopsis thaliana has a large number of enzymes whose substrates and products remain unknown. The field of bio-prospection searches biologically diverse areas for metabolites which might serve as pharmaceuticals. De novo identification of metabolite mass spectra requires new concepts and methods since, unlike proteins, metabolites possess a non-linear molecular structure. Results: In this work, we introduce a method for fully automated de novo identification of metabolites from tandem mass spectra. Mass spectrometry data is usually assumed to be insufficient for identification of molecular structures, so we want to estimate the molecular formula of the unknown metabolite, a crucial step for its identification. The method first calculates all molecular formulas that explain the parent peak mass. Then, a graph is build where vertices correspond to molecular formulas of all peaks in the fragmentation mass spectra, whereas edges correspond to hypothetical fragmentation steps. Our algorithm afterwards calculates the maximum scoring subtree of this graph: each peak in the spectra must be scored at most once, so the subtree shall contain only one explanation per peak. Unfortunately, finding this subtree is NP-hard. We suggest three exact algorithms (including one fixed-parameter tractable algorithm) as well as two heuristics to solve the problem. Tests on real mass spectra show that the FPT algorithm and the heuristics solve the problem suitably fast and provide excellent results: for all 32 test compounds the correct solution was among the top five suggestions, for 26 compounds the first suggestion of the exact algorithm was correct.
引用
收藏
页码:I49 / I55
页数:7
相关论文
共 21 条
[1]   A fast and simple algorithm for the money changing problem [J].
Boecker, Sebastian ;
Liptak, Zsuzsanna .
ALGORITHMICA, 2007, 48 (04) :413-432
[2]  
Böcker S, 2006, LECT NOTES COMPUT SC, V4175, P12
[3]   A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry [J].
Chen, T ;
Kao, MY ;
Tepel, M ;
Rush, J ;
Church, GM .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (03) :325-337
[4]   The secondary metabolism of Arabidopsis thaliana:: growing like a weed [J].
D'Auria, JC ;
Gershenzon, J .
CURRENT OPINION IN PLANT BIOLOGY, 2005, 8 (03) :308-316
[5]  
Fellows MR, 2007, LECT NOTES COMPUT SC, V4596, P340
[6]  
Garey MR, 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[7]  
HEINONEN M, 2006, LECT NOTES INFORM P, V83, P40
[8]   From genomics to chemical genomics: new developments in KEGG [J].
Kanehisa, Minoru ;
Goto, Susumu ;
Hattori, Masahiro ;
Aoki-Kinoshita, Kiyoko F. ;
Itoh, Masumi ;
Kawashima, Shuichi ;
Katayama, Toshiaki ;
Araki, Michihiro ;
Hirakawa, Mika .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D354-D357
[9]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[10]   Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry [J].
Kind, Tobias ;
Fiehn, Oliver .
BMC BIOINFORMATICS, 2007, 8 (1)