The Ensembl automatic gene annotation system

被引:269
作者
Curwen, V
Eyras, E
Andrews, TD
Clarke, L
Mongin, E
Searle, SMJ
Clamp, M
机构
[1] Wellcome Trust Sanger Inst, Cambridge, England
[2] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[3] Broad Inst, Cambridge, MA 02141 USA
关键词
D O I
10.1101/gr.1858004
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.
引用
收藏
页码:942 / 950
页数:9
相关论文
共 28 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [4] DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS
    BOGUSKI, MS
    LOWE, TMJ
    TOLSTOSHEV, CM
    [J]. NATURE GENETICS, 1993, 4 (04) : 332 - 333
  • [5] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [6] Evaluation of gene structure prediction programs
    Burset, M
    Guigo, R
    [J]. GENOMICS, 1996, 34 (03) : 353 - 367
  • [7] The DNA sequence and comparative analysis of human chromosome 20
    Deloukas, P
    Matthews, LH
    Ashurst, J
    Burton, J
    Gilbert, JGR
    Jones, M
    Stavrides, G
    Almeida, JP
    Babbage, AK
    Bagguley, CL
    Bailey, J
    Barlow, KF
    Bates, KN
    Beard, LM
    Beare, DM
    Beasley, OP
    Bird, CP
    Blakey, SE
    Bridgeman, AM
    Brown, AJ
    Buck, D
    Burrill, W
    Butler, AP
    Carder, C
    Carter, NP
    Chapman, JC
    Clamp, M
    Clark, G
    Clark, LN
    Clark, SY
    Clee, CM
    Clegg, S
    Cobley, VE
    Collier, RE
    Connor, R
    Corby, NR
    Coulson, A
    Coville, GJ
    Deadman, R
    Dhami, P
    Dunn, M
    Ellington, AG
    Frankland, JA
    Fraser, A
    French, L
    Garner, P
    Grafham, DV
    Griffiths, C
    Griffiths, ND
    Gwilliam, R
    [J]. NATURE, 2001, 414 (6866) : 865 - U3
  • [8] Computational detection and location of transcription start sites in mammalian genomic DNA
    Down, TA
    Hubbard, TJP
    [J]. GENOME RESEARCH, 2002, 12 (03) : 458 - 461
  • [9] An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks
    Gaunt, MW
    Miles, MA
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (05) : 748 - 761
  • [10] WormBase: a cross-species database for comparative genomics
    Harris, TW
    Lee, R
    Schwarz, E
    Bradnam, K
    Lawson, D
    Chen, W
    Blasier, D
    Kenny, E
    Cunningham, F
    Kishore, R
    Chan, J
    Muller, HM
    Petcherski, A
    Thorisson, G
    Day, A
    Bieri, T
    Rogers, A
    Chen, CK
    Spieth, J
    Sternberg, P
    Durbin, R
    Stein, LD
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 133 - 137