共 24 条
CEGMA: a pipeline to accurately annotate core genes in eukaryotic genornes
被引:1649
作者:

Parra, Genis
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA

Bradnam, Keith
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA

Korf, Ian
论文数: 0 引用数: 0
h-index: 0
机构: Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
机构:
[1] Univ Calif Davis, Genome Ctr, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Mol & Cellular Biol, Davis, CA 95616 USA
关键词:
D O I:
10.1093/bioinformatics/btm071
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
Motivation: The numbers of finished and ongoing genome projects are increasing at a rapid rate, and providing the catalog of genes for these new genomes is a key challenge. Obtaining a set of well-characterized genes is a basic requirement in the initial steps of any genome annotation process. An accurate set of genes is needed in order to learn about species-specific properties, to train gene-finding programs, and to validate automatic predictions. Unfortunately, many new genome projects lack comprehensive experimental data to derive a reliable initial set of genes. Results: In this study, we report a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data. We define a set of conserved protein families that occur in a wide range of eukaryotes, and present a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. Our procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages. Availability: Software and data sets are available online at http:// korflab.ucdavis.edu/Datasets.
引用
收藏
页码:1061 / 1067
页数:7
相关论文
共 24 条
[1]
Translational selection and molecular evolution
[J].
Akashi, H
;
Eyre-Walker, A
.
CURRENT OPINION IN GENETICS & DEVELOPMENT,
1998, 8 (06)
:688-693

Akashi, H
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Kansas, Dept Ecol & Evolutionary Biol, Lawrence, KS 66045 USA Univ Kansas, Dept Ecol & Evolutionary Biol, Lawrence, KS 66045 USA

Eyre-Walker, A
论文数: 0 引用数: 0
h-index: 0
机构: Univ Kansas, Dept Ecol & Evolutionary Biol, Lawrence, KS 66045 USA
[2]
BASIC LOCAL ALIGNMENT SEARCH TOOL
[J].
ALTSCHUL, SF
;
GISH, W
;
MILLER, W
;
MYERS, EW
;
LIPMAN, DJ
.
JOURNAL OF MOLECULAR BIOLOGY,
1990, 215 (03)
:403-410

ALTSCHUL, SF
论文数: 0 引用数: 0
h-index: 0
机构: PENN STATE UNIV,DEPT COMP SCI,UNIVERSITY PK,PA 16802

GISH, W
论文数: 0 引用数: 0
h-index: 0
机构: PENN STATE UNIV,DEPT COMP SCI,UNIVERSITY PK,PA 16802

MILLER, W
论文数: 0 引用数: 0
h-index: 0
机构: PENN STATE UNIV,DEPT COMP SCI,UNIVERSITY PK,PA 16802

MYERS, EW
论文数: 0 引用数: 0
h-index: 0
机构: PENN STATE UNIV,DEPT COMP SCI,UNIVERSITY PK,PA 16802

LIPMAN, DJ
论文数: 0 引用数: 0
h-index: 0
机构: PENN STATE UNIV,DEPT COMP SCI,UNIVERSITY PK,PA 16802
[3]
GeneWise and genomewise
[J].
Birney, E
;
Clamp, M
;
Durbin, R
.
GENOME RESEARCH,
2004, 14 (05)
:988-995

Birney, E
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, Cambridge CB10 1SA, England

Clamp, M
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, Cambridge CB10 1SA, England

Durbin, R
论文数: 0 引用数: 0
h-index: 0
机构: European Bioinformat Inst, Cambridge CB10 1SA, England
[4]
RECOGNITION OF GENES IN DNA-SEQUENCE WITH AMBIGUITIES
[J].
BORODOVSKY, M
;
MCININCH, J
.
BIOSYSTEMS,
1993, 30 (1-3)
:161-171

BORODOVSKY, M
论文数: 0 引用数: 0
h-index: 0
机构:
MOSCOW MOLEC GENET INST,MOSCOW,RUSSIA MOSCOW MOLEC GENET INST,MOSCOW,RUSSIA

MCININCH, J
论文数: 0 引用数: 0
h-index: 0
机构:
MOSCOW MOLEC GENET INST,MOSCOW,RUSSIA MOSCOW MOLEC GENET INST,MOSCOW,RUSSIA
[5]
Genome annotation past, present, and future: How to define an ORF at each locus
[J].
Brent, MR
.
GENOME RESEARCH,
2005, 15 (12)
:1777-1786

Brent, MR
论文数: 0 引用数: 0
h-index: 0
机构:
Washington Univ, Lab Computat Gen, St Louis, MO 63130 USA Washington Univ, Lab Computat Gen, St Louis, MO 63130 USA
[6]
Prediction of complete gene structures in human genomic DNA
[J].
Burge, C
;
Karlin, S
.
JOURNAL OF MOLECULAR BIOLOGY,
1997, 268 (01)
:78-94

Burge, C
论文数: 0 引用数: 0
h-index: 0
机构: Department of Mathematics, Stanford University, Stanford

Karlin, S
论文数: 0 引用数: 0
h-index: 0
机构: Department of Mathematics, Stanford University, Stanford
[7]
Evaluation of gene structure prediction programs
[J].
Burset, M
;
Guigo, R
.
GENOMICS,
1996, 34 (03)
:353-367

Burset, M
论文数: 0 引用数: 0
h-index: 0
机构: INST MUNICIPAL INVEST MED, DEPT MED INFORMAT, E-08003 BARCELONA, SPAIN

Guigo, R
论文数: 0 引用数: 0
h-index: 0
机构: INST MUNICIPAL INVEST MED, DEPT MED INFORMAT, E-08003 BARCELONA, SPAIN
[8]
The Ensembl automatic gene annotation system
[J].
Curwen, V
;
Eyras, E
;
Andrews, TD
;
Clarke, L
;
Mongin, E
;
Searle, SMJ
;
Clamp, M
.
GENOME RESEARCH,
2004, 14 (05)
:942-950

Curwen, V
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England

Eyras, E
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England

Andrews, TD
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England

Clarke, L
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England

Mongin, E
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England

Searle, SMJ
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England

Clamp, M
论文数: 0 引用数: 0
h-index: 0
机构: Wellcome Trust Sanger Inst, Cambridge, England
[9]
The draft genome of Ciona intestinalis:: Insights into chordate and vertebrate origins
[J].
Dehal, P
;
Satou, Y
;
Campbell, RK
;
Chapman, J
;
Degnan, B
;
De Tomaso, A
;
Davidson, B
;
Di Gregorio, A
;
Gelpke, M
;
Goodstein, DM
;
Harafuji, N
;
Hastings, KEM
;
Ho, I
;
Hotta, K
;
Huang, W
;
Kawashima, T
;
Lemaire, P
;
Martinez, D
;
Meinertzhagen, IA
;
Necula, S
;
Nonaka, M
;
Putnam, N
;
Rash, S
;
Saiga, H
;
Satake, M
;
Terry, A
;
Yamada, L
;
Wang, HG
;
Awazu, S
;
Azumi, K
;
Boore, J
;
Branno, M
;
Chin-bow, S
;
DeSantis, R
;
Doyle, S
;
Francino, P
;
Keys, DN
;
Haga, S
;
Hayashi, H
;
Hino, K
;
Imai, KS
;
Inaba, K
;
Kano, S
;
Kobayashi, K
;
Kobayashi, M
;
Lee, BI
;
Makabe, KW
;
Manohar, C
;
Matassi, G
;
Medina, M
.
SCIENCE,
2002, 298 (5601)
:2157-2167

Dehal, P
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Satou, Y
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Campbell, RK
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Chapman, J
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Degnan, B
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

De Tomaso, A
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Davidson, B
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Di Gregorio, A
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Gelpke, M
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Goodstein, DM
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Harafuji, N
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Hastings, KEM
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Ho, I
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Hotta, K
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Huang, W
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Kawashima, T
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Lemaire, P
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Martinez, D
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Meinertzhagen, IA
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Necula, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Nonaka, M
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Putnam, N
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Rash, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Saiga, H
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Satake, M
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Terry, A
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Yamada, L
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Wang, HG
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Awazu, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Azumi, K
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Boore, J
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Branno, M
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Chin-bow, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

DeSantis, R
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Doyle, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Francino, P
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Keys, DN
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Haga, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Hayashi, H
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Hino, K
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Imai, KS
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Inaba, K
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Kano, S
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Kobayashi, K
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Kobayashi, M
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Lee, BI
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Makabe, KW
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Manohar, C
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Matassi, G
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA

Medina, M
论文数: 0 引用数: 0
h-index: 0
机构: US DOE, Joint Genome Inst, Walnut Creek, CA 94598 USA
[10]
Profile hidden Markov models
[J].
Eddy, SR
.
BIOINFORMATICS,
1998, 14 (09)
:755-763

Eddy, SR
论文数: 0 引用数: 0
h-index: 0
机构:
Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA