Improving pan-genome annotation using whole genome multiple alignment

被引:28
作者
Angiuoli, Samuel V. [1 ,2 ]
Hotopp, Julie C. Dunning [2 ]
Salzberg, Steven L. [1 ]
Tettelin, Herve [2 ]
机构
[1] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[2] Univ Maryland, IGS, Baltimore, MD 21201 USA
关键词
TRANSLATION INITIATION SITE; GENE PREDICTION; BACTERIAL; ERRORS;
D O I
10.1186/1471-2105-12-272
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results: We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions: Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.
引用
收藏
页数:11
相关论文
共 30 条
[1]
Toward an online repository of Standard Operating Procedures (SOPs) for (Meta) genomic annotation [J].
Angiuoli, Samuel V. ;
Gussman, Aaron ;
Klimke, William ;
Cochrane, Guy ;
Field, Dawn ;
Garrity, George ;
Kodira, Chinnappa D. ;
Kyrpides, Nikos ;
Madupu, Ramana ;
Markowitz, Victor ;
Tatusova, Tatiana ;
Thomson, Nick ;
White, Owen .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2008, 12 (02) :137-141
[2]
Mugsy: fast multiple alignment of closely related whole genomes [J].
Angiuoli, Samuel V. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2011, 27 (03) :334-342
[3]
[Anonymous], P NATL ACAD SCI US
[4]
Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis [J].
Bakke, Peter ;
Carney, Nick ;
DeLoache, Will ;
Gearing, Mary ;
Ingvorsen, Kjeld ;
Lotz, Matt ;
McNair, Jay ;
Penumetcha, Pallavi ;
Simpson, Samantha ;
Voss, Laura ;
Win, Max ;
Heyer, Laurie J. ;
Campbell, A. Malcolm .
PLOS ONE, 2009, 4 (07)
[5]
Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
[6]
Errors in genome annotation [J].
Brenner, SE .
TRENDS IN GENETICS, 1999, 15 (04) :132-133
[7]
progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement [J].
Darling, Aaron E. ;
Mau, Bob ;
Perna, Nicole T. .
PLOS ONE, 2010, 5 (06)
[8]
Identifying bacterial genes and endosymbiont DNA with Glimmer [J].
Delcher, Arthur L. ;
Bratke, Kirsten A. ;
Powers, Edwin C. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2007, 23 (06) :673-679
[9]
A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera [J].
Deloger, Marc ;
El Karoui, Meriem ;
Petit, Marie-Agnes .
JOURNAL OF BACTERIOLOGY, 2009, 191 (01) :91-99
[10]
Intrinsic errors in genome annotation [J].
Devos, D ;
Valencia, A .
TRENDS IN GENETICS, 2001, 17 (08) :429-431