Vertebrate gene predictions and the problem of large genes

被引:47
作者
Wang, J
Li, ST
Zhang, Y
Zheng, HK
Xu, Z
Ye, J
Yu, J
Wong, GKS [1 ]
机构
[1] Chinese Acad Sci, Beijing Inst Genom, Beijing 101300, Peoples R China
[2] Zhejiang Univ, Watson Inst, Hangzhou Genom Inst, Key Lab Bioinformat Zhejiang Province, Hangzhou 310007, Peoples R China
[3] Peking Univ, Coll Life Sci, Beijing 100871, Peoples R China
[4] Univ Washington, Dept Med, Genome Ctr, Seattle, WA 98195 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
D O I
10.1038/nrg1160
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
To find unknown protein-coding genes, annotation pipelines use a combination of ab initio gene prediction and similarity to experimentally confirmed genes or proteins, Here, we show that although the ab initio predictions have an intrinsically high false-positive rate, they also have a consistently low false-negative rate. The incorporation of similarity information is meant to reduce the false-positive rate, but in doing so it increases the false-negative rate. The crucial variable is gene size (including introns) - genes of the most extreme sizes, especially very large genes, are most likely to be incorrectly predicted.
引用
收藏
页码:741 / 749
页数:9
相关论文
共 39 条
[1]   Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[2]   A biologist's view of the Drosophila genome annotation assessment project [J].
Ashburner, M .
GENOME RESEARCH, 2000, 10 (04) :391-393
[3]  
Ashburner M, 2001, GENOME RES, V11, P1425
[4]   Comparative sequence analysis of plant nuclear genomes: Microcolinearity and its many exceptions [J].
Bennetzen, JL .
PLANT CELL, 2000, 12 (07) :1021-1029
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[7]   Do we need a huge new centre to annotate the human genome? [J].
Claverie, JM .
NATURE, 2000, 403 (6765) :12-12
[8]   Reevaluating human gene annotation: A second-generation analysis of chromosome 22 [J].
Collins, JE ;
Goward, ME ;
Cole, CG ;
Smink, LJ ;
Huckle, EJ ;
Knowles, S ;
Bye, JM ;
Beare, DM ;
Dunham, I .
GENOME RESEARCH, 2003, 13 (01) :27-36
[9]   The DNA sequence and comparative analysis of human chromosome 20 [J].
Deloukas, P ;
Matthews, LH ;
Ashurst, J ;
Burton, J ;
Gilbert, JGR ;
Jones, M ;
Stavrides, G ;
Almeida, JP ;
Babbage, AK ;
Bagguley, CL ;
Bailey, J ;
Barlow, KF ;
Bates, KN ;
Beard, LM ;
Beare, DM ;
Beasley, OP ;
Bird, CP ;
Blakey, SE ;
Bridgeman, AM ;
Brown, AJ ;
Buck, D ;
Burrill, W ;
Butler, AP ;
Carder, C ;
Carter, NP ;
Chapman, JC ;
Clamp, M ;
Clark, G ;
Clark, LN ;
Clark, SY ;
Clee, CM ;
Clegg, S ;
Cobley, VE ;
Collier, RE ;
Connor, R ;
Corby, NR ;
Coulson, A ;
Coville, GJ ;
Deadman, R ;
Dhami, P ;
Dunn, M ;
Ellington, AG ;
Frankland, JA ;
Fraser, A ;
French, L ;
Garner, P ;
Grafham, DV ;
Griffiths, C ;
Griffiths, ND ;
Gwilliam, R .
NATURE, 2001, 414 (6866) :865-U3
[10]   Computational Genomics of noncoding RNA genes [J].
Eddy, SR .
CELL, 2002, 109 (02) :137-140