Assemblathon 1: A competitive assessment of de novo short read assembly methods

被引:321
作者
Earl, Dent [1 ,2 ]
Bradnam, Keith [3 ]
St John, John [1 ,2 ]
Darling, Aaron [3 ]
Lin, Dawei [3 ,4 ]
Fass, Joseph [3 ,4 ]
Hung On Ken Yu [3 ]
Buffalo, Vince [3 ,4 ]
Zerbino, Daniel R. [2 ]
Diekhans, Mark [1 ,2 ]
Ngan Nguyen [1 ,2 ]
Ariyaratne, Pramila Nuwantha [5 ]
Sung, Wing-Kin [5 ,6 ]
Ning, Zemin [7 ]
Haimel, Matthias [8 ]
Simpson, Jared T. [7 ]
Fonseca, Nuno A. [9 ]
Birol, Inanc [10 ]
Docking, T. Roderick [10 ]
Ho, Isaac Y. [11 ]
Rokhsar, Daniel S. [11 ,12 ]
Chikhi, Rayan [13 ,14 ]
Lavenier, Dominique [13 ,14 ,15 ]
Chapuis, Guillaume [13 ,14 ]
Naquin, Delphine [14 ,15 ]
Maillet, Nicolas [14 ,15 ]
Schatz, Michael C. [16 ]
Kelley, David R. [17 ]
Phillippy, Adam M. [17 ,18 ]
Koren, Sergey [17 ,18 ]
Yang, Shiaw-Pyng [19 ]
Wu, Wei [19 ]
Chou, Wen-Chi [20 ]
Srivastava, Anuj [20 ]
Shaw, Timothy I. [20 ]
Ruby, J. Graham [21 ,23 ]
Skewes-Cox, Peter [21 ,22 ,23 ]
Betegon, Miguel [21 ,23 ]
Dimon, Michelle T. [21 ,23 ]
Solovyev, Victor [24 ]
Seledtsov, Igor [25 ]
Kosarev, Petr [25 ]
Vorobyev, Denis [25 ]
Ramirez-Gonzalez, Ricardo [26 ]
Leggett, Richard [27 ]
MacLean, Dan [27 ]
Xia, Fangfang [28 ]
Luo, Ruibang [29 ]
Li, Zhenyu [29 ]
Xie, Yinlong [29 ]
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[3] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[4] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[5] Genome Inst Singapore, Computat & Math Biol Grp, Singapore 119077, Singapore
[6] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore
[7] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[8] EMBL EBI, Cambridge CB10 1SA, England
[9] Univ Porto, CRACS INESC Porto LA, P-4169007 Oporto, Portugal
[10] British Columbia Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4E6, Canada
[11] US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA
[12] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[13] ENS Cachan IRISA, Dept Comp Sci, F-35042 Rennes, France
[14] IRISA, CNRS Symbiose, F-35042 Rennes, France
[15] INRIA, F-35042 Rennes, France
[16] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, Cold Spring Harbor, NY 11724 USA
[17] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[18] Natl Biodef Anal & Countermeasures Ctr, Frederick, MD 20702 USA
[19] Monsanto Co, Chesterfield, MO 63017 USA
[20] Univ Georgia, Inst Bioinformat, Athens, GA 30602 USA
[21] Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94143 USA
[22] Univ Calif San Francisco, Biol & Med Informat Program, San Francisco, CA 94143 USA
[23] Howard Hughes Med Inst, Bethesda, MD 20814 USA
[24] Univ London, Dept Comp Sci, London WC1E 7HU, England
[25] Softberry Inc, Mt Kisco, NY 10549 USA
[26] Norwich Res Pk, Genome Anal Ctr, Norwich NR4 7UH, Norfolk, England
[27] Norwich Res Pk, Sainsbury Lab, Norwich NR4 71H, Norfolk, England
[28] Univ Chicago, Computat Inst, Chicago, IL 60637 USA
[29] BGI Shenzhen, Shenzhen 518083, Peoples R China
[30] Broad Inst, Cambridge, MA 02142 USA
[31] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[32] Univ Calif Davis, Genome Ctr, Santa Cruz, CA 95064 USA
基金
中国国家自然科学基金;
关键词
SHORT DNA-SEQUENCES; STRING GRAPH; GENOME; ALIGNMENT; ALGORITHMS; ACCURACY; MILLIONS; BASE;
D O I
10.1101/gr.126599.111
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: ( 1) It is possible to assemble the genome to a high level of coverage and accuracy, and that ( 2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
引用
收藏
页码:2224 / 2241
页数:18
相关论文
共 73 条
[1]   Breakpoint graphs and ancestral genome reconstructions [J].
Alekseyev, Max A. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (05) :943-957
[2]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[5]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[6]   On sorting by translocations [J].
Bergeron, A ;
Mixtacki, J ;
Stoye, J .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) :567-578
[7]  
Bergeron A, 2006, LECT NOTES COMPUT SC, V4175, P163
[8]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[9]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[10]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330