PhyloScan: identification of transcription factor binding sites using cross-species evidence

被引:19
作者
Carmack, C. Steven [1 ]
McCue, Lee Ann [1 ,2 ]
Newberg, Lee A. [1 ,3 ]
Lawrence, Charles E. [1 ,4 ]
机构
[1] New York State Dept Hlth, Wadsworth Ctr, Albany, NY 12201 USA
[2] Pacific NW Natl Lab, Richland, WA 99352 USA
[3] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[4] Brown Univ, Div Appl Math, Providence, RI 02912 USA
关键词
DNA-SEQUENCES; STATISTICAL SIGNIFICANCE; UTILIZATION SYSTEMS; PATTERNS; PREDICTION; BACTERIA; MATRICES; DATABASE; SIGNAL;
D O I
10.1186/1748-7188-2-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. Methods: We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. Results: In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. Conclusion: Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region.
引用
收藏
页数:17
相关论文
共 47 条
[31]   MatInd and MatInspector: New fast and versatile tools for detection of consensus matches in nucleotide sequence data [J].
Quandt, K ;
Frech, K ;
Karas, H ;
Wingender, E ;
Werner, T .
NUCLEIC ACIDS RESEARCH, 1995, 23 (23) :4878-4884
[32]   The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons [J].
Rajewsky, N ;
Socci, ND ;
Zapotocky, M ;
Siggia, ED .
GENOME RESEARCH, 2002, 12 (02) :298-308
[33]   Automatic clustering of orthologs and in-paralogs from pairwise species comparisons [J].
Remm, M ;
Storm, CEV ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 314 (05) :1041-1052
[34]   A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome [J].
Robison, K ;
McGuire, AM ;
Church, GM .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 284 (02) :241-254
[35]   Conservation of the biotin regulon and the BirA regulatory signal in eubacteria and archaea [J].
Rodionov, DA ;
Mironov, AA ;
Gelfand, MS .
GENOME RESEARCH, 2002, 12 (10) :1507-1516
[36]  
Rodionov DA, 2001, J MOL MICROB BIOTECH, V3, P319
[37]   Transcriptional regulation of transport and utilization systems for hexuronides, hexuronates and hexonates in gamma purple bacteria [J].
Rodionov, DA ;
Mironov, AA ;
Rakhmaninova, AB ;
Gelfand, MS .
MOLECULAR MICROBIOLOGY, 2000, 38 (04) :673-683
[38]   RegulonDB (version 4.0):: transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 [J].
Salgado, H ;
Gama-Castro, S ;
Martínez-Antonio, A ;
Díaz-Peredo, E ;
Sánchez-Solano, F ;
Peralta-Gil, M ;
Garcia-Alonso, D ;
Jiménez-Jacinto, V ;
Santos-Zavaleta, A ;
Bonavides-Martínez, C ;
Collado-Vides, J .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D303-D306
[39]   INFORMATION-CONTENT OF BINDING-SITES ON NUCLEOTIDE-SEQUENCES [J].
SCHNEIDER, TD ;
STORMO, GD ;
GOLD, L ;
EHRENFEUCHT, A .
JOURNAL OF MOLECULAR BIOLOGY, 1986, 188 (03) :415-431
[40]  
Smith T. F., 1981, Adv Appl Math, V2, P482, DOI DOI 10.1016/0196-8858(81)90046-4