A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs

被引:144
作者
Swain, Martin T. [1 ,2 ]
Tsai, Isheng J. [1 ]
Assefa, Samual A. [1 ]
Newbold, Chris [1 ,3 ]
Berriman, Matthew [1 ]
Otto, Thomas D. [1 ]
机构
[1] Wellcome Trust Sanger Inst, Wellcome Trust Genome Campus, Cambridge, England
[2] Aberystwyth Univ, Inst Biol Environm & Rural Sci, Aberystwyth, Dyfed, Wales
[3] Univ Oxford, John Radcliffe Hosp, Weatherall Inst Mol Med, Oxford OX3 9DU, England
基金
英国惠康基金;
关键词
SEQUENCE ASSEMBLIES; LEISHMANIA; ALGORITHMS; EVOLUTION; STRAINS; SYSTEM; READS;
D O I
10.1038/nprot.2012.068
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes similar to 24 h: it doubles the average contig size and annotates over 4,300 gene models.
引用
收藏
页码:1260 / 1284
页数:25
相关论文
共 54 条
[1]   Limitations of next-generation genome sequence assembly [J].
Alkan, Can ;
Sajjadian, Saba ;
Eichler, Evan E. .
NATURE METHODS, 2011, 8 (01) :61-65
[2]   A System for Automated Bacterial (genome) Integrated Annotation - SABIA [J].
Almeida, LGP ;
Paixao, R ;
Souza, RC ;
da Costa, GC ;
Barrientos, FJA ;
dos Santos, MT ;
de Almeida, DF ;
Vasconcelos, ATR .
BIOINFORMATICS, 2004, 20 (16) :2832-2833
[3]  
[Anonymous], CURR PROTOC BIOINFOR
[4]   ABACAS: algorithm-based automatic contiguation of assembled sequences [J].
Assefa, Samuel ;
Keane, Thomas M. ;
Otto, Thomas D. ;
Newbold, Chris ;
Berriman, Matthew .
BIOINFORMATICS, 2009, 25 (15) :1968-1969
[5]   Scaffolding pre-assembled contigs using SSPACE [J].
Boetzer, Marten ;
Henkel, Christiaan V. ;
Jansen, Hans J. ;
Butler, Derek ;
Pirovano, Walter .
BIOINFORMATICS, 2011, 27 (04) :578-579
[6]   Steady progress and recent breakthroughs in the accuracy of automated genome annotation [J].
Brent, Michael R. .
NATURE REVIEWS GENETICS, 2008, 9 (01) :62-73
[7]   Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database [J].
Carver, Tim ;
Berriman, Matthew ;
Tivey, Adrian ;
Patel, Chinmay ;
Bohme, Ulrike ;
Barrell, Barclay G. ;
Parkhill, Julian ;
Rajandream, Marie-Adele .
BIOINFORMATICS, 2008, 24 (23) :2672-2676
[8]   Genome Project Standards in a New Era of Sequencing [J].
Chain, P. S. G. ;
Grafham, D. V. ;
Fulton, R. S. ;
FitzGerald, M. G. ;
Hostetler, J. ;
Muzny, D. ;
Ali, J. ;
Birren, B. ;
Bruce, D. C. ;
Buhay, C. ;
Cole, J. R. ;
Ding, Y. ;
Dugan, S. ;
Field, D. ;
Garrity, G. M. ;
Gibbs, R. ;
Graves, T. ;
Han, C. S. ;
Harrison, S. H. ;
Highlander, S. ;
Hugenholtz, P. ;
Khouri, H. M. ;
Kodira, C. D. ;
Kolker, E. ;
Kyrpides, N. C. ;
Lang, D. ;
Lapidus, A. ;
Malfatti, S. A. ;
Markowitz, V. ;
Metha, T. ;
Nelson, K. E. ;
Parkhill, J. ;
Pitluck, S. ;
Qin, X. ;
Read, T. D. ;
Schmutz, J. ;
Sozhamannan, S. ;
Sterk, P. ;
Strausberg, R. L. ;
Sutton, G. ;
Thomson, N. R. ;
Tiedje, J. M. ;
Weinstock, G. ;
Wollam, A. ;
Detter, J. C. .
SCIENCE, 2009, 326 (5950) :236-237
[9]   Finishing the euchromatic sequence of the human genome [J].
Collins, FS ;
Lander, ES ;
Rogers, J ;
Waterston, RH .
NATURE, 2004, 431 (7011) :931-945
[10]   Rapid Pneumococcal Evolution in Response to Clinical Interventions [J].
Croucher, Nicholas J. ;
Harris, Simon R. ;
Fraser, Christophe ;
Quail, Michael A. ;
Burton, John ;
van der Linden, Mark ;
McGee, Lesley ;
von Gottberg, Anne ;
Song, Jae Hoon ;
Ko, Kwan Soo ;
Pichon, Bruno ;
Baker, Stephen ;
Parry, Christopher M. ;
Lambertsen, Lotte M. ;
Shahinas, Dea ;
Pillai, Dylan R. ;
Mitchell, Timothy J. ;
Dougan, Gordon ;
Tomasz, Alexander ;
Klugman, Keith P. ;
Parkhill, Julian ;
Hanage, William P. ;
Bentley, Stephen D. .
SCIENCE, 2011, 331 (6016) :430-434