Genome assembly forensics: finding the elusive mis-assembly

被引:178
作者
Phillippy, Adam M. [1 ]
Schatz, Michael C. [1 ]
Pop, Mihai [1 ]
机构
[1] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
关键词
D O I
10.1186/gb-2008-9-3-r55
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We present the first collection of tools aimed at automated genome assembly validation. This paper formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at http://amos.sourceforge.net.
引用
收藏
页数:25
相关论文
共 42 条
[31]   The genome assembly archive: A new public resource [J].
Salzberg, SL ;
Church, D ;
DiCuccio, M ;
Yaschenko, E ;
Ostell, J .
PLOS BIOLOGY, 2004, 2 (09) :1273-1275
[32]   Beware of mis-assembled genomes [J].
Salzberg, SL ;
Yorke, JA .
BIOINFORMATICS, 2005, 21 (24) :4320-4321
[33]   Hawkeye: an interactive visual analytics tool for genome assemblies [J].
Schatz, Michael C. ;
Phillippy, Adam M. ;
Shneiderman, Ben ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2007, 8 (03)
[34]   Quality assessment of the human genome sequency [J].
Schmutz, J ;
Wheeler, J ;
Grimwood, J ;
Dickson, M ;
Yang, DJ ;
Caoile, C ;
Bajorek, E ;
Black, S ;
Chan, YM ;
Denys, M ;
Escobar, J ;
Flowers, D ;
Fotopulos, D ;
Garcia, C ;
Gomez, M ;
Gonzales, E ;
Haydu, L ;
Lopez, F ;
Ramirez, L ;
Retterer, J ;
Rodriguez, A ;
Rogers, S ;
Salazar, A ;
Tsai, M ;
Myers, RM .
NATURE, 2004, 429 (6990) :365-368
[35]   Computational comparison of human genomic sequence assemblies for a region of chromosome 4 [J].
Semple, CAM ;
Morris, SW ;
Porteous, DJ ;
Evans, KL .
GENOME RESEARCH, 2002, 12 (03) :424-429
[36]   Shotgun sequence assembly and recent segmental duplications within the human genome [J].
She, XW ;
Jiang, ZX ;
Clark, RL ;
Liu, G ;
Cheng, Z ;
Tuzun, E ;
Church, DM ;
Sutton, G ;
Halpern, AL ;
Eichler, EE .
NATURE, 2004, 431 (7011) :927-930
[37]  
Staden R, 2000, Methods Mol Biol, V132, P115
[38]   Human genome - End of the beginning [J].
Stein, LD .
NATURE, 2004, 431 (7011) :915-916
[39]   SIMPLE TANDEM DNA REPEATS AND HUMAN GENETIC-DISEASE [J].
SUTHERLAND, GR ;
RICHARDS, RI .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (09) :3636-3641
[40]   Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs [J].
Tammi, MT ;
Arner, E ;
Britton, T ;
Andersson, B .
BIOINFORMATICS, 2002, 18 (03) :379-388