Creating a honey bee consensus gene set

被引:245
作者
Elsik, Christine G. [1 ]
Mackey, Aaron J.
Reese, Justin T.
Milshina, Natalia V.
Roos, David S.
Weinstock, George M.
机构
[1] Texas A&M Univ, Dept Anim Sci, College Stn, TX 77843 USA
[2] Univ Penn, Penn Genom Inst, Philadelphia, PA 19104 USA
[3] GlaxoSmithKline, Collegeville, PA 19426 USA
[4] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
关键词
D O I
10.1186/gb-2007-8-1-r13
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: We wished to produce a single reference gene set for honey bee (Apis mellifera). Our motivation was twofold. First, we wished to obtain an improved set of gene models with increased coverage of known genes, while maintaining gene model quality. Second, we wished to provide a single official gene list that the research community could further utilize for consistent and comparable analyses and functional annotation. Results: We created a consensus gene set for honey bee (Apis mellifera) using GLEAN, a new algorithm that uses latent class analysis to automatically combine disparate gene prediction evidence in the absence of known genes. The consensus gene models had increased representation of honey bee genes without sacrificing quality compared with any one of the input gene predictions. When compared with manually annotated gold standards, the consensus set of gene models was similar or superior in quality to each of the input sets. Conclusion: Most eukaryotic genome projects produce multiple gene sets because of the variety of gene prediction programs. Each of the gene prediction programs has strengths and weaknesses, and so the multiplicity of gene sets offers users a more comprehensive collection of genes to use than is available from a single program. On the other hand, the availability of multiple gene sets is also a cause for uncertainty among users as regards which set they should use. GLEAN proved to be an effective method to combine gene lists into a single reference set.
引用
收藏
页数:8
相关论文
共 13 条
  • [1] FlyBase: genes and gene models
    Drysdale, RA
    Crosby, MA
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D390 - D395
  • [2] Community annotation: Procedures, protocols, and supporting tools
    Elsik, Christine G.
    Worley, Kim C.
    Zhang, Lan
    Milshina, Natalia V.
    Jiang, Huaiyang
    Reese, Justin T.
    Childs, Kevin L.
    Venkatraman, Anand
    Dickens, C. Michael
    Weinstock, George M.
    Gibbs, Richard A.
    [J]. GENOME RESEARCH, 2006, 16 (11) : 1329 - 1333
  • [3] Eval: A software package for analysis of genome annotations
    Keibler, E
    Brent, MR
    [J]. BMC BIOINFORMATICS, 2003, 4 (1)
  • [4] Gene finding in novel genomes
    Korf, I
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [5] Lewis SE., 2002, Genome Biol, V3, DOI [DOI 10.1186/GB-2002-3-12-RESEARCH0082, DOI 10.1186/GB-2002-3-12]
  • [6] IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON
    PEARSON, WR
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) : 2444 - 2448
  • [7] TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets
    Pertea, G
    Huang, XQ
    Liang, F
    Antonescu, V
    Sultana, R
    Karamycheva, S
    Lee, Y
    White, J
    Cheung, F
    Parvizi, B
    Tsai, J
    Quackenbush, J
    [J]. BIOINFORMATICS, 2003, 19 (05) : 651 - 652
  • [8] Automated generation of heuristics for biological sequence comparison
    Slater, GS
    Birney, E
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [9] The bioperl toolkit:: Perl modules for the life sciences
    Stajich, JE
    Block, D
    Boulez, K
    Brenner, SE
    Chervitz, SA
    Dagdigian, C
    Fuellen, G
    Gilbert, JGR
    Korf, I
    Lapp, H
    Lehväslaiho, H
    Matsalla, C
    Mungall, CJ
    Osborne, BI
    Pocock, MR
    Schattner, P
    Senger, M
    Stein, LD
    Stupka, E
    Wilkinson, MD
    Birney, E
    [J]. GENOME RESEARCH, 2002, 12 (10) : 1611 - 1618
  • [10] The Generic Genome Browser: A building block for a model organism system database
    Stein, LD
    Mungall, C
    Shu, SQ
    Caudy, M
    Mangone, M
    Day, A
    Nickerson, E
    Stajich, JE
    Harris, TW
    Arva, A
    Lewis, S
    [J]. GENOME RESEARCH, 2002, 12 (10) : 1599 - 1610