Interpolated Markov chains for eukaryotic promoter recognition

被引:74
作者
Ohler, U
Harbeck, S
Niemann, H
Nöth, M
Reese, MG
机构
[1] Univ Erlangen Nurnberg, Chair Pattern Recognit Comp Sci 5, D-91058 Erlangen, Germany
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
关键词
D O I
10.1093/bioinformatics/15.5.362
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. Results: a 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a man correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou (Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp.
引用
收藏
页码:362 / 369
页数:8
相关论文
共 20 条
[1]   Detection of eukaryotic promoters using Markov transition matrices [J].
Audic, S ;
Claverie, JM .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :223-227
[2]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[3]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]   Eukaryotic promoter recognition [J].
Fickett, JW ;
Hatzigeorgiou, AC .
GENOME RESEARCH, 1997, 7 (09) :861-878
[6]   ASSESSMENT OF PROTEIN CODING MEASURES [J].
FICKETT, JW ;
TUNG, CS .
NUCLEIC ACIDS RESEARCH, 1992, 20 (24) :6441-6450
[7]  
FRECH K, 1998, SILICO BIOL, V1
[8]  
Hutchinson GB, 1996, COMPUT APPL BIOSCI, V12, P391
[9]   RNA polymerase II transcription control [J].
Kornberg, RD .
TRENDS IN BIOCHEMICAL SCIENCES, 1996, 21 (09) :325-326
[10]  
KROGH A, 1997, P 5 INT C INT SYST M, P179