Ab initio identification of putative human transcription factor binding sites by comparative genomics - art. no. 110

被引:25
作者
Corà, D
Herrmann, C
Dieterich, C
Di Cunto, F
Provero, P
Caselle, M [1 ]
机构
[1] Univ Turin, Dipartimento Fis Teor, Via P Giuria 1, I-10125 Turin, Italy
[2] INFN, I-10125 Turin, Italy
[3] Univ Mediterranee CNRS, LGPD IBDM, F-13288 Marseille 9, France
[4] Max Planck Inst Mol Genet, D-14195 Berlin, Germany
[5] Univ Turin, Dipartimento Genet Biol & Biochim, I-10126 Turin, Italy
关键词
D O I
10.1186/1471-2105-6-110
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Understanding transcriptional regulation of gene expression is one of the greatest challenges of modern molecular biology. A central role in this mechanism is played by transcription factors, which typically bind to specific, short DNA sequence motifs usually located in the upstream region of the regulated genes. We discuss here a simple and powerful approach for the ab initio identification of these cis- regulatory motifs. The method we present integrates several elements: human- mouse comparison, statistical analysis of genomic sequences and the concept of coregulation. We apply it to a complete scan of the human genome. Results: By using the catalogue of conserved upstream sequences collected in the CORG database we construct sets of genes sharing the same overrepresented motif ( short DNA sequence) in their upstream regions both in human and in mouse. We perform this construction for all possible motifs from 5 to 8 nucleotides in length and then filter the resulting sets looking for two types of evidence of coregulation: first, we analyze the Gene Ontology annotation of the genes in the set, searching for statistically significant common annotations; second, we analyze the expression profiles of the genes in the set as measured by microarray experiments, searching for evidence of coexpression. The sets which pass one or both filters are conjectured to contain a significant fraction of coregulated genes, and the upstream motifs characterizing the sets are thus good candidates to be the binding sites of the TF's involved in such regulation. In this way we find various known motifs and also some new candidate binding sites. Conclusion: We have discussed a new integrated algorithm for the " ab initio" identification of transcription factor binding sites in the human genome. The method is based on three ingredients: comparative genomics, overrepresentation, different types of coregulation. The method is applied to a full- scan of the human genome, giving satisfactory results.
引用
收藏
页数:12
相关论文
共 40 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   cis element/transcription factor analysis (cis/TF):: A method for discovering transcription factor/cis element relationships [J].
Birnbaum, K ;
Benfey, PN ;
Shasha, DE .
GENOME RESEARCH, 2001, 11 (09) :1567-1573
[4]   Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes [J].
Caselle, M ;
Di Cunto, F ;
Provero, P .
BMC BIOINFORMATICS, 2002, 3 (1)
[5]   Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts [J].
Chiang, DY ;
Moses, AM ;
Kellis, M ;
Lander, ES ;
Eisen, MB .
GENOME BIOLOGY, 2003, 4 (07)
[6]   Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrep-resented upstream motifs -: art. no. 57 [J].
Corà, D ;
Di Cunto, F ;
Provero, P ;
Silengo, L ;
Caselle, M .
BMC BIOINFORMATICS, 2004, 5 (1)
[7]   CORG: a database for COmparative Regulatory Genomics [J].
Dieterich, C ;
Wang, H ;
Rateitschak, K ;
Luz, H ;
Vingron, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :55-57
[8]   Annotating regulatory DNA based on man-mouse genomic comparison [J].
Dieterich, C ;
Cusack, B ;
Wang, HY ;
Rateitschak, K ;
Krause, A ;
Vingron, M .
BIOINFORMATICS, 2002, 18 :S84-S90
[9]   STRONG CONSERVATION OF NONCODING SEQUENCES DURING VERTEBRATES EVOLUTION - POTENTIAL INVOLVEMENT IN POSTTRANSCRIPTIONAL REGULATION OF GENE-EXPRESSION [J].
DURET, L ;
DORKELD, F ;
GAUTIER, C .
NUCLEIC ACIDS RESEARCH, 1993, 21 (10) :2315-2322
[10]   Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the α globin cluster [J].
Flint, J ;
Tufarelli, C ;
Peden, J ;
Clark, K ;
Daniels, RJ ;
Haudison, R ;
Miller, W ;
Philipsen, S ;
Tan-Un, KC ;
NcMorrow, T ;
Frampton, J ;
Alter, BP ;
Frischauf, AM ;
Higgs, DR .
HUMAN MOLECULAR GENETICS, 2001, 10 (04) :371-382