Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome

被引:47
作者
Gopal, S
Schroeder, M
Pieper, U
Sczyrba, A
Aytekin-Kurban, G
Bekiranov, S
Fajardo, JE
Eswar, N
Sanchez, R
Sali, A
Gaasterland, T
机构
[1] Rockefeller Univ, Lab Computat Genom, New York, NY 10021 USA
[2] Rockefeller Univ, Dept Biophys, New York, NY 10021 USA
关键词
D O I
10.1038/85922
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The approach to annotating a genome critically affects the number acid accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST. cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes(1). This annotation strategy is applicable to genomes of all organisms, including human.
引用
收藏
页码:337 / 340
页数:4
相关论文
共 26 条
  • [1] The genome sequence of Drosophila melanogaster
    Adams, MD
    Celniker, SE
    Holt, RA
    Evans, CA
    Gocayne, JD
    Amanatides, PG
    Scherer, SE
    Li, PW
    Hoskins, RA
    Galle, RF
    George, RA
    Lewis, SE
    Richards, S
    Ashburner, M
    Henderson, SN
    Sutton, GG
    Wortman, JR
    Yandell, MD
    Zhang, Q
    Chen, LX
    Brandon, RC
    Rogers, YHC
    Blazej, RG
    Champe, M
    Pfeiffer, BD
    Wan, KH
    Doyle, C
    Baxter, EG
    Helt, G
    Nelson, CR
    Miklos, GLG
    Abril, JF
    Agbayani, A
    An, HJ
    Andrews-Pfannkoch, C
    Baldwin, D
    Ballew, RM
    Basu, A
    Baxendale, J
    Bayraktaroglu, L
    Beasley, EM
    Beeson, KY
    Benos, PV
    Berman, BP
    Bhandari, D
    Bolshakov, S
    Borkova, D
    Botchan, MR
    Bouck, J
    Brokstein, P
    [J]. SCIENCE, 2000, 287 (5461) : 2185 - 2195
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases
    Altschul, SF
    Koonin, EV
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) : 444 - 447
  • [4] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [5] GenBank
    Benson, DA
    Boguski, MS
    Lipman, DJ
    Ostell, J
    Ouellette, BFF
    Rapp, BA
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 12 - 17
  • [6] The PDB data uniformity project
    Bhat, TN
    Bourne, P
    Feng, ZK
    Gilliland, G
    Jain, S
    Ravichandran, V
    Schneider, B
    Schneider, K
    Thanki, N
    Weissig, H
    Westbrook, J
    Berman, HM
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 214 - 218
  • [7] GENE DISCOVERY IN DBEST
    BOGUSKI, MS
    TOLSTOSHEV, CM
    BASSETT, DE
    [J]. SCIENCE, 1994, 265 (5181) : 1993 - 1994
  • [8] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [9] Finding the genes in genomic DNA
    Burge, CB
    Karlin, S
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) : 346 - 354
  • [10] Structural genomics: beyond the Human Genome Project
    Burley, SK
    Almo, SC
    Bonanno, JB
    Capel, M
    Chance, MR
    Gaasterland, T
    Lin, DW
    Sali, A
    Studier, FW
    Swaminathan, S
    [J]. NATURE GENETICS, 1999, 23 (02) : 151 - 157