MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes

被引:1328
作者
Cantarel, Brandi L. [1 ]
Korf, Ian [2 ,3 ]
Robb, Sofia M. C. [4 ]
Parra, Genis [2 ,3 ]
Ross, Eric [5 ]
Moore, Barry [1 ]
Holt, Carson [1 ]
Alvarado, Alejandro Sanchez [4 ,5 ]
Yandell, Mark [1 ]
机构
[1] Univ Utah, Eccles Inst Human Genet, Salt Lake City, UT 84112 USA
[2] Univ Calif Davis, Dept Mol & Cellular Biol, Davis, CA 95616 USA
[3] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[4] Univ Utah, Sch Med, Dept Neurobiol & Anat, Salt Lake City, UT 84132 USA
[5] Univ Utah, Sch Med, Howard Hughes Med Inst, Salt Lake City, UT 84132 USA
关键词
D O I
10.1101/gr.6743907
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have developed a portable and easily configurable genome annotation pipeline called MAKER. Its purpose is to allow investigators to independently annotate eukaryotic genomes and create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices. MAKER is also easily trainable: Outputs Of preliminary runs are used to automatically retrain its gene-prediction algorithm, producing higher-quality gene-models on subsequent runs. MAKER's inputs are minimal, and its outputs can be used to create a GMOD database. Its Outputs can also be viewed in the Apollo Genome browser; this feature of MAKER provides an easy means to annotate, view, and edit individual contigs and BACs without the overhead of a database. As proof of principle, we have used MAKER to annotate the genome of the planarian Schmidtea mediterranea and to create a new genome database, SmedGD. We have also compared MAKER's performance to other published annotation pipelines. Our results demonstrate that MAKER provides a simple and effective means to convert a genome sequence into a community-accessible genome database. MAKER should prove especially useful for emerging model Organism genome projects for which extensive bioinformatics resources may not be readily available.
引用
收藏
页码:188 / 196
页数:9
相关论文
共 33 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] The Schmidtea mediterranea database as a molecular resource for studying platyhelminthes, stem cells and regeneration
    Alvarado, AS
    Newmark, PA
    Robb, SMC
    Juste, R
    [J]. DEVELOPMENT, 2002, 129 (24): : 5659 - 5665
  • [3] Double-stranded RNA specifically disrupts gene expression during planarian regeneration
    Alvarado, AS
    Newmark, PA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (09) : 5049 - 5054
  • [4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [5] Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkp985, 10.1093/nar/gkh121]
  • [6] PILER: identification and classification of genomic repeats
    Edgar, RC
    Myers, EW
    [J]. BIOINFORMATICS, 2005, 21 : I152 - I158
  • [7] MUSCLE: multiple sequence alignment with high accuracy and high throughput
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
  • [8] The Sequence Ontology: a tool for the unification of genome annotations
    Eilbeck, K
    Lewis, SE
    Mungall, CJ
    Yandell, M
    Stein, L
    Durbin, R
    Ashburner, M
    [J]. GENOME BIOLOGY, 2005, 6 (05)
  • [9] Accelerated probabilistic inference of RNA structure evolution
    Holmes I.
    [J]. BMC Bioinformatics, 6 (1)
  • [10] CAP3: A DNA sequence assembly program
    Huang, XQ
    Madan, A
    [J]. GENOME RESEARCH, 1999, 9 (09) : 868 - 877