A third approach to gene prediction suggests thousands of additional human transcribed regions

被引:33
作者
Glusman, Gustavo [1 ]
Qin, Shizhen
El-Gewely, Raafat
Siegel, Andrew F.
Roach, Jared C.
Hood, Leroy
Smit, Arian F. A.
机构
[1] Inst Syst Biol, Seattle, WA USA
[2] Univ Tromso, Med Biol Inst, Tromso, Norway
[3] Univ Washington, Dept Management Sci, Seattle, WA 98195 USA
[4] Univ Washington, Dept Finance & Stat, Seattle, WA 98195 USA
关键词
D O I
10.1371/journal.pcbi.0020018
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."
引用
收藏
页码:160 / 173
页数:14
相关论文
共 45 条
[1]  
[Anonymous], 1983, MULTIDIMENSIONAL SCA
[2]  
[Anonymous], 1996, Modern Multidimensional Scaling: Theory and Applications (Springer Series in Statistics)
[3]  
BENARD J, 1985, CANCER RES, V45, P4970
[4]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[5]   Identification of hundreds of conserved and nonconserved human microRNAs [J].
Bentwich, I ;
Avniel, A ;
Karov, Y ;
Aharonov, R ;
Gilad, S ;
Barad, O ;
Barzilai, A ;
Einat, P ;
Einav, U ;
Meiri, E ;
Sharon, E ;
Spector, Y ;
Bentwich, Z .
NATURE GENETICS, 2005, 37 (07) :766-770
[6]   Recent advances in gene structure prediction [J].
Brent, MR ;
Guigó, R .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :264-272
[7]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[8]   Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution [J].
Cheng, J ;
Kapranov, P ;
Drenkow, J ;
Dike, S ;
Brubaker, S ;
Patel, S ;
Long, J ;
Stern, D ;
Tammana, H ;
Helt, G ;
Sementchenko, V ;
Piccolboni, A ;
Bekiranov, S ;
Bailey, DK ;
Ganesh, M ;
Ghosh, S ;
Bell, I ;
Gerhard, DS ;
Gingeras, TR .
SCIENCE, 2005, 308 (5725) :1149-1154
[9]   Association between divergence and interspersed repeats in mammalian noncoding genomic DNA [J].
Chiaromonte, F ;
Yang, S ;
Elnitski, L ;
Yap, VB ;
Miller, W ;
Hardison, RC .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (25) :14503-14508
[10]   FINE-STRUCTURE OF THE HUMAN CERULOPLASMIN GENE [J].
DAIMON, M ;
YAMATANI, K ;
IGARASHI, M ;
FUKASE, N ;
KAWANAMI, T ;
KATO, T ;
TOMINAGA, M ;
SASAKI, H .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 1995, 208 (03) :1028-1035