The KA/KS ratio test for assessing the protein-coding potential of genomic regions:: An empirical and simulation study

被引:241
作者
Nekrutenko, A [1 ]
Makova, KD [1 ]
Li, WH [1 ]
机构
[1] Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
关键词
D O I
10.1101/gr.200901
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (K-S) Occur much more frequently than nonsynonymous ones (K-A) and uses the K-A/K-S ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by Current methods; (3) the test has a false-negative rate, lower than most Of Current gene prediction methods and a false-positive rate lower than all Current methods; (4) the test has been automated and call be used in combination with other existing gene-prediction methods.
引用
收藏
页码:198 / 202
页数:5
相关论文
共 13 条
[1]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[2]   Active conservation of noncoding sequences revealed by three-way species comparisons [J].
Dubchak, I ;
Brudno, M ;
Loots, GG ;
Pachter, L ;
Mayor, C ;
Rubin, EM ;
Frazer, KA .
GENOME RESEARCH, 2000, 10 (09) :1304-1306
[3]  
GOLDMAN N, 1994, MOL BIOL EVOL, V11, P725
[4]   Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs [J].
Jareborg, N ;
Birney, E ;
Durbin, R .
GENOME RESEARCH, 1999, 9 (09) :815-824
[5]   Conservation, regulation, synteny, and introns in a large-scale C-briggsae-C-elegans genomic alignment [J].
Kent, WJ ;
Zahler, AM .
GENOME RESEARCH, 2000, 10 (08) :1115-1125
[6]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[7]  
Li W.-H., 1997, MOL EVOLUTION, P177
[8]   Evolutionary parameters of the transcribed mammalian genome: An analysis of 2,820 orthologous rodent and human sequences [J].
Makalowski, W ;
Boguski, MS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (16) :9407-9412
[9]   Evaluation of gene-finding programs on mammalian sequences [J].
Rogic, S ;
Mackworth, AK ;
Ouellette, FBF .
GENOME RESEARCH, 2001, 11 (05) :817-832
[10]   CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE [J].
THOMPSON, JD ;
HIGGINS, DG ;
GIBSON, TJ .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4673-4680