COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets

被引:50
作者
Bose, Tungadri [1 ]
Haque, Mohammed Monzoorul [1 ]
Reddy, C. V. S. K. [1 ]
Mande, Sharmila S. [1 ]
机构
[1] Tata Consultancy Serv Ltd, Biosci R&D Div, TCS Innovat Labs, Pune 411013, Maharashtra, India
来源
PLOS ONE | 2015年 / 10卷 / 11期
关键词
METABOLISM;
D O I
10.1371/journal.pone.0142102
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for endusers. The COGNIZER framework includes a cross-mapping database that enables endusers to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Results Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. Conclusion The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the flexibility of choosing a homology search protocol based on available compute resources. The cross-mapping database in COGNIZER is of high utility since it enables end-users to directly infer/derive KEGG, Pfam, GO, and SEED subsystem annotations from COG categorizations. Furthermore, availability of COGNIZER as a stand-alone scalable implementation is expected to make it a valuable annotation tool in the field of metagenomic research.
引用
收藏
页数:16
相关论文
共 28 条
[1]  
Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[2]   The oral metagenome in health and disease [J].
Belda-Ferre, Pedro ;
Alcaraz, Luis David ;
Cabrera-Rubio, Raul ;
Romero, Hector ;
Simon-Soro, Aurea ;
Pignatelli, Miguel ;
Mira, Alex .
ISME JOURNAL, 2012, 6 (01) :46-56
[3]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[4]   Functional metagenomic profiling of nine biomes [J].
Dinsdale, Elizabeth A. ;
Edwards, Robert A. ;
Hall, Dana ;
Angly, Florent ;
Breitbart, Mya ;
Brulc, Jennifer M. ;
Furlan, Mike ;
Desnues, Christelle ;
Haynes, Matthew ;
Li, Linlin ;
McDaniel, Lauren ;
Moran, Mary Ann ;
Nelson, Karen E. ;
Nilsson, Christina ;
Olson, Robert ;
Paul, John ;
Brito, Beltran Rodriguez ;
Ruan, Yijun ;
Swan, Brandon K. ;
Stevens, Rick ;
Valentine, David L. ;
Thurber, Rebecca Vega ;
Wegley, Linda ;
White, Bryan A. ;
Rohwer, Forest .
NATURE, 2008, 452 (7187) :629-U8
[5]   Accelerated Profile HMM Searches [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
[6]   The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species [J].
Gaudet, Pascale ;
Chisholm, Rex ;
Berardini, Tanya ;
Dimmer, Emily ;
Engel, Stacia R. ;
Fey, Petra ;
Hill, David P. ;
Howe, Doug ;
Hu, James C. ;
Huntley, Rachael ;
Khodiyar, Varsha K. ;
Kishore, Ranjana ;
Li, Donghui ;
Lovering, Ruth C. ;
McCarthy, Fiona ;
Ni, Li ;
Petri, Victoria ;
Siegele, Deborah A. ;
Tweedie, Susan ;
Van Auken, Kimberly ;
Wood, Valerie ;
Basu, Siddhartha ;
Carbon, Seth ;
Dolan, Mary ;
Mungall, Christopher J. ;
Dolinski, Kara ;
Thomas, Paul ;
Ashburner, Michael ;
Blake, Judith A. ;
Cherry, J. Michael ;
Lewis, Suzanna E. .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (07)
[7]   METAREP: JCVI metagenomics reports-an open source tool for high-performance comparative metagenomics [J].
Goll, Johannes ;
Rusch, Douglas B. ;
Tanenbaum, David M. ;
Thiagarajan, Mathangi ;
Li, Kelvin ;
Methe, Barbara A. ;
Yooseph, Shibu .
BIOINFORMATICS, 2010, 26 (20) :2631-2632
[8]   A new bioinformatics analysis tools framework at EMBL-EBI [J].
Goujon, Mickael ;
McWilliam, Hamish ;
Li, Weizhong ;
Valentin, Franck ;
Squizzato, Silvano ;
Paern, Juri ;
Lopez, Rodrigo .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W695-W699
[9]   A poor man's BLASTX-high-throughput metagenomic protein database search using PAUDA [J].
Huson, Daniel H. ;
Xie, Chao .
BIOINFORMATICS, 2014, 30 (01) :38-39
[10]  
Huson DH, 2012, METHODS MOL BIOL, V856, P415, DOI 10.1007/978-1-61779-585-5_17