Genome assembly forensics: finding the elusive mis-assembly

被引:178
作者
Phillippy, Adam M. [1 ]
Schatz, Michael C. [1 ]
Pop, Mihai [1 ]
机构
[1] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
关键词
D O I
10.1186/gb-2008-9-3-r55
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We present the first collection of tools aimed at automated genome assembly validation. This paper formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at http://amos.sourceforge.net.
引用
收藏
页数:25
相关论文
共 42 条
[1]   DNPTrapper: an assembly editing tool for finishing and analysis of complex repeat regions [J].
Arner, E ;
Tammi, MT ;
Tran, AN ;
Kindlund, E ;
Andersson, B .
BMC BIOINFORMATICS, 2006, 7 (1)
[2]   BACCardI -: a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison [J].
Bartels, D ;
Kespohl, S ;
Albaum, S ;
Drüke, T ;
Goesmann, A ;
Herold, J ;
Kaiser, O ;
Pühler, A ;
Pfeiffer, F ;
Raddatz, G ;
Stoye, J ;
Meyer, F ;
Schuster, SC .
BIOINFORMATICS, 2005, 21 (07) :853-859
[3]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[4]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[5]   An intermediate grade of finished genomic sequence suitable for comparative analyses [J].
Blakesley, RW ;
Hansen, NF ;
Mullikin, JC ;
Thomas, PJ ;
McDowell, JC ;
Maskeri, B ;
Young, AC ;
Benjamin, B ;
Brooks, SY ;
Coleman, BI ;
Gupta, J ;
Ho, SL ;
Karlins, EM ;
Maduro, QL ;
Stantripop, S ;
Tsurgeon, C ;
Vogt, JL ;
Walker, MA ;
Masiello, CA ;
Guan, XB ;
Bouffared, GG ;
Green, ED .
GENOME RESEARCH, 2004, 14 (11) :2235-2244
[6]   Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence [J].
Cheung, J ;
Estivill, X ;
Khaja, R ;
MacDonald, JR ;
Lau, K ;
Tsui, LC ;
Scherer, SW .
GENOME BIOLOGY, 2003, 4 (04)
[7]   THE ACCURACY OF DNA-SEQUENCES - ESTIMATING SEQUENCE QUALITY [J].
CHURCHILL, GA ;
WATERMAN, MS .
GENOMICS, 1992, 14 (01) :89-98
[8]   Fast algorithms for large-scale genome alignment and comparison [J].
Delcher, AL ;
Phillippy, A ;
Carlton, J ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2002, 30 (11) :2478-2483
[9]   A tool for analyzing mate pairs in assemblies (TAMPA) [J].
Dew, IM ;
Walenz, B ;
Sutton, G .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (05) :497-513
[10]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194