Finding novel genes in bacterial communities isolated from the environment

被引:40
作者
Krause, Lutz [1 ]
Diaz, Naryttza N.
Bartels, Daniela
Edwards, Robert A.
Puehler, Alfred
Rohwer, Forest
Meyer, Folker
Stoye, Jens
机构
[1] Univ Bielefeld, Ctr Biotechnol, D-33594 Bielefeld, Germany
[2] Fellowship Interpretat Genom, Burr Ridge, IL USA
[3] San Diego State Univ, Dept Biol, San Diego, CA 92182 USA
[4] Ctr Microbial Sci, San Diego, CA USA
[5] Univ Bielefeld, Tech Fak, D-33594 Bielefeld, Germany
[6] Univ Bielefeld, Lehrstuhl Genet, Fak Biol, D-33594 Bielefeld, Germany
关键词
D O I
10.1093/bioinformatics/btl247
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Novel sequencing techniques can give access to organisms that are difficult to cultivate using conventional methods. When applied to environmental samples, the data generated has some drawbacks, e. g. short length of assembled contigs, in-frame stop codons and frame shifts. Unfortunately, current gene finders cannot circumvent these difficulties. At the same time, the automated prediction of genes is a prerequisite for the increasing amount of genomic sequences to ensure progress in metagenomics. Results: We introduce a novel gene finding algorithm that incorporates features overcoming the short length of the assembled contigs from environmental data, in-frame stop codons as well as frame shifts contained in bacterial sequences. The results show that by searching for sequence similarities in an environmental sample our algorithm is capable of detecting a high fraction of its gene content, depending on the species composition and the overall size of the sample. The method is valuable for hunting novel unknown genes that may be specific for the habitat where the sample is taken. Finally, we show that our algorithm can even exploit the limited information contained in the short reads generated by 454 technology for the prediction of protein coding genes.
引用
收藏
页码:E281 / E289
页数:9
相关论文
共 23 条
[1]   CRITICA: Coding region identification tool invoking comparative analysis [J].
Badger, JH ;
Olsen, GJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (04) :512-524
[2]   Heuristic approach to deriving models for gene finding [J].
Besemer, J ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 1999, 27 (19) :3911-3920
[3]   Genomic analysis of uncultured marine viral communities [J].
Breitbart, M ;
Salamon, P ;
Andresen, B ;
Mahaffy, JM ;
Segall, AM ;
Mead, D ;
Azam, F ;
Rohwer, F .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (22) :14250-14255
[4]   Evolution of the protein repertoire [J].
Chothia, C ;
Gough, J ;
Vogel, C ;
Teichmann, SA .
SCIENCE, 2003, 300 (5626) :1701-1703
[5]   Improved microbial gene identification with GLIMMER [J].
Delcher, AL ;
Harmon, D ;
Kasif, S ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (23) :4636-4641
[6]   Viral metagenomics [J].
Edwards, RA ;
Rohwer, F .
NATURE REVIEWS MICROBIOLOGY, 2005, 3 (06) :504-510
[7]   Using pyrosequencing to shed light on deep mine microbial ecology [J].
Edwards, Robert A. ;
Rodriguez-Brito, Beltran ;
Wegley, Linda ;
Haynes, Matthew ;
Breitbart, Mya ;
Peterson, Dean M. ;
Saar, Martin O. ;
Alexander, Scott ;
Alexander, E. Calvin, Jr. ;
Rohwer, Forest .
BMC GENOMICS, 2006, 7 (1)
[8]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[9]   Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [J].
Frishman, D ;
Mironov, A ;
Mewes, HW ;
Gelfand, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (12) :2941-2947
[10]   A molecular revolution in the study of intestinal microflora [J].
Furrie, E .
GUT, 2006, 55 (02) :141-143