Tracembler-software for in-silico chromosome walking in unassembled genomes

被引:11
作者
Dong, Qunfeng
Wilkerson, Matthew D.
Brendel, Volker [1 ]
机构
[1] Iowa State Univ, Dept Genet Dev & Cell Biol, Ames, IA 50011 USA
[2] Iowa State Univ, Dept Stat, Ames, IA 50011 USA
[3] Indiana Univ, Ctr Genom & Bioinformat, Bloomington, IN USA
关键词
D O I
10.1186/1471-2105-8-151
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user. Results: Tracembler takes one or multiple DNA or protein sequence( s) as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length Chrm2 gene as well as its upstream and downstream genomic regions from Rattus norvegicus reads. In a second example, a query with two adjacent Medicago truncatula genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus. Conclusion: Tracembler streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting Tracembler is provided at http://www.plantgdb.org/tool/tracembler/, and the software is also freely available from the authors for local installations.
引用
收藏
页数:6
相关论文
共 24 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], MEDICAGO TRUNCATULA
[3]   GENOTRACE: cDNA-based local GENOme assembly from TRACE archives [J].
Berezikov, E ;
Plasterk, RHA ;
Cuppen, E .
BIOINFORMATICS, 2002, 18 (10) :1396-1397
[4]   Genome sequence of the Brown Norway rat yields insights into mammalian evolution [J].
Gibbs, RA ;
Weinstock, GM ;
Metzker, ML ;
Muzny, DM ;
Sodergren, EJ ;
Scherer, S ;
Scott, G ;
Steffen, D ;
Worley, KC ;
Burch, PE ;
Okwuonu, G ;
Hines, S ;
Lewis, L ;
DeRamo, C ;
Delgado, O ;
Dugan-Rocha, S ;
Miner, G ;
Morgan, M ;
Hawes, A ;
Gill, R ;
Holt, RA ;
Adams, MD ;
Amanatides, PG ;
Baden-Tillson, H ;
Barnstead, M ;
Chin, S ;
Evans, CA ;
Ferriera, S ;
Fosler, C ;
Glodek, A ;
Gu, ZP ;
Jennings, D ;
Kraft, CL ;
Nguyen, T ;
Pfannkoch, CM ;
Sitter, C ;
Sutton, GG ;
Venter, JC ;
Woodage, T ;
Smith, D ;
Lee, HM ;
Gustafson, E ;
Cahill, P ;
Kana, A ;
Doucette-Stamm, L ;
Weinstock, K ;
Fechtel, K ;
Weiss, RB ;
Dunn, DM ;
Green, ED .
NATURE, 2004, 428 (6982) :493-521
[5]   Engineering a software tool for gene structure prediction in higher organisms [J].
Gremme, G ;
Brendel, V ;
Sparks, ME ;
Kurtz, S .
INFORMATION AND SOFTWARE TECHNOLOGY, 2005, 47 (15) :965-978
[6]   CAP3: A DNA sequence assembly program [J].
Huang, XQ ;
Madan, A .
GENOME RESEARCH, 1999, 9 (09) :868-877
[7]   BLAST: at the core of a powerful and diverse set of sequence analysis tools [J].
McGinnis, S ;
Madden, TL .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W20-W25
[8]   Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana [J].
Mudge, Joann ;
Cannon, Steven B. ;
Kalo, Peter ;
Oldroyd, Giles Ed ;
Roe, Bruce A. ;
Town, Christopher D. ;
Young, Nevin D. .
BMC PLANT BIOLOGY, 2005, 5 (1)
[9]   SSAHA: A fast search method for large DNA databases [J].
Ning, ZM ;
Cox, AJ ;
Mullikin, JC .
GENOME RESEARCH, 2001, 11 (10) :1725-1729
[10]  
WILSON RF, 2004, LEGUME CROP GENOMICS