Gene annotation: Prediction and testing

被引:45
作者
Ashurst, JL [1 ]
Collins, JE [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
关键词
human genome; manual annotation; ab initio prediction;
D O I
10.1146/annurev.genom.4.070802.110300
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Fifty years after the publication of DNA structure, the whole human genome sequence will be officially finished. This achievement marks the beginning of the task to catalogue every human gene and identify each of their function expression patterns. Currently, researchers estimate that there are about 30,000 human genes and approximately 70% of these can be automatically predicted using a combination of ab initio and similarity-based programs. However, to experimentally investigate every gene's function, the research community requires a high-quality annotation of alternative splicing, pseudogenes, and promoter regions that can only be provided by manual intervention. Manual curation of the human genome will be a long-term project as experimental data are continually produced to confirm or refine the predictions, and new features such as noncoding RNAs and enhancers have not been fully identified. Such a highly curated human gene-set made publicly available will be a great asset for the experimental community and for future comparative genome projects.
引用
收藏
页码:69 / 88
页数:22
相关论文
共 102 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]   SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model [J].
Alexandersson, M ;
Cawley, S ;
Pachter, L .
GENOME RESEARCH, 2003, 13 (03) :496-502
[3]  
Ashburner M, 2001, GENOME RES, V11, P1425
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[7]   Patterns of variant polyadenylation signal usage in human genes [J].
Beaudoing, E ;
Freier, S ;
Wyatt, JR ;
Claverie, JM ;
Gautheret, D .
GENOME RESEARCH, 2000, 10 (07) :1001-1010
[8]   Using GeneWise in the Drosophila annotation experiment [J].
Birney, E ;
Durbin, R .
GENOME RESEARCH, 2000, 10 (04) :547-548
[9]   Databases and tools for browsing genomes [J].
Birney, E ;
Clamp, M ;
Hubbard, T .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2002, 3 :293-310
[10]  
Boumil RM, 2001, HUM MOL GENET, V10, P2225