ProSOM:: core promoter prediction based on unsupervised clustering of DNA physical profiles

被引:63
作者
Abeel, Thomas [1 ,2 ]
Saeys, Yvan [1 ,2 ]
Rouze, Pierre [1 ,3 ]
Van de Peer, Yves [1 ,2 ]
机构
[1] Univ Ghent, VIB, Dept Plant Syst Biol, B-9052 Ghent, Belgium
[2] Univ Ghent, Dept Mol Genet, B-9052 Ghent, Belgium
[3] Univ Ghent, Lab Associe INRA, B-9052 Ghent, Belgium
关键词
D O I
10.1093/bioinformatics/btn172
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter. Because the identification of the transcription initiation region is such a challenging problem, it is not yet a common practice to integrate transcription start site prediction in genome annotation projects. Nevertheless, better core promoter prediction can improve genome annotation and can be used to guide experimental work. Results: Comparing the average structural profile based on base stacking energy of transcribed, promoter and intergenic sequences demonstrates that the core promoter has unique features that cannot be found in other sequences. We show that unsupervised clustering by using self-organizing maps can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences. An implementation of this promoter prediction program, called ProSOM, is available and has been compared with the state-of-the-art. We propose an objective, accurate and biologically sound validation scheme for core promoter predictors. ProSOM performs at least as well as the software currently available, but our technique is more balanced in terms of the number of predicted sites and the number of false predictions, resulting in a better all-round performance. Additional tests on the ENCODE regions of the human genome show that 98 of all predictions made by ProSOM can be associated with transcriptionally active regions, which demonstrates the high precision.
引用
收藏
页码:I24 / I31
页数:8
相关论文
共 51 条
  • [1] Generic eukaryotic core promoter prediction using structural features of DNA
    Abeel, Thomas
    Saeys, Yvan
    Bonnet, Eric
    Rouze, Pierre
    Van de Peer, Yves
    [J]. GENOME RESEARCH, 2008, 18 (02) : 310 - 323
  • [2] Comprehensive analysis of the base composition around the transcription start site in Metazoa
    Aerts, S
    Thijs, G
    Dabrowski, M
    Moreau, Y
    De Moor, B
    [J]. BMC GENOMICS, 2004, 5 (1)
  • [3] [Anonymous], 2006, GENOME BIOL S1
  • [4] Promoter prediction analysis on the whole human genome
    Bajic, VB
    Tan, SL
    Suzuki, Y
    Sugano, S
    [J]. NATURE BIOTECHNOLOGY, 2004, 22 (11) : 1467 - 1473
  • [5] Bajic VB, 2003, METHOD ENZYMOL, V370, P237
  • [6] Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters
    Bajic, VB
    Seah, SH
    Chong, A
    Zhang, GL
    Koh, JLY
    Brusic, V
    [J]. BIOINFORMATICS, 2002, 18 (01) : 198 - 199
  • [7] BAJIC VB, 2006, GENOME BIOL S1, V7
  • [8] Baldi P, 1998, Proc Int Conf Intell Syst Mol Biol, V6, P35
  • [9] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [10] Steady progress and recent breakthroughs in the accuracy of automated genome annotation
    Brent, Michael R.
    [J]. NATURE REVIEWS GENETICS, 2008, 9 (01) : 62 - 73