From co-expression to co-regulation: how many microarray experiments do we need?

被引:72
作者
Yeung, KY [1 ]
Medvedovic, M
Bumgarner, RE
机构
[1] Univ Washington, Dept Microbiol, Seattle, WA 98195 USA
[2] Univ Cincinnati, Med Ctr, Dept Environm Hlth, Ctr Genome Informat, Cincinnati, OH 45267 USA
关键词
D O I
10.1186/gb-2004-5-7-r48
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Cluster analysis is often used to infer regulatory modules or biological function by associating unknown genes with other genes that have similar expression patterns and known regulatory elements or functions. However, clustering results may not have any biological relevance. Results: We applied various clustering algorithms to microarray datasets with different sizes, and we evaluated the clustering results by determining the fraction of gene pairs from the same clusters that share at least one known common transcription factor. We used both yeast transcription factor databases (SCPD, YPD) and chromatin immunoprecipitation (ChIP) data to evaluate our clustering results. We showed that the ability to identify co-regulated genes from clustering results is strongly dependent on the number of microarray experiments used in cluster analysis and the accuracy of these associations plateaus at between 50 and 100 experiments on yeast data. Moreover, the model-based clustering algorithm MCLUST consistently outperforms more traditional methods in accurately assigning co-regulated genes to the same clusters on standardized data. Conclusions: Our results are consistent with respect to independent evaluation criteria that strengthen our confidence in our results. However, when one compares ChIP data to YPD, the false-negative rate is approximately 80% using the recommended p-value of 0.001. In addition, we showed that even with large numbers of experiments, the false-positive rate may exceed the true-positive rate. In particular, even when all experiments are included, the best results produce clusters with only a 28% true-positive rate using known gene transcription factor interactions.
引用
收藏
页数:11
相关论文
共 35 条
  • [1] The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD):: comprehensive resources for the organization and comparison of model organism protein information
    Costanzo, MC
    Hogan, JD
    Cusick, ME
    Davis, BP
    Fancher, AM
    Hodges, PE
    Kondu, P
    Lengieza, C
    Lew-Smith, JE
    Lingner, C
    Roberg-Perez, KJ
    Tillberg, M
    Brooks, JE
    Garrels, JI
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 73 - 76
  • [2] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [3] How many clusters? Which clustering method? Answers via model-based cluster analysis
    Fraley, C
    Raftery, AE
    [J]. COMPUTER JOURNAL, 1998, 41 (08) : 578 - 588
  • [4] Model-based clustering, discriminant analysis, and density estimation
    Fraley, C
    Raftery, AE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) : 611 - 631
  • [5] MCLUST: Software for model-based cluster analysis
    Fraley, C
    Raftery, AE
    [J]. JOURNAL OF CLASSIFICATION, 1999, 16 (02) : 297 - 306
  • [6] Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p
    Gasch, AP
    Huang, MX
    Metzner, S
    Botstein, D
    Elledge, SJ
    Brown, PO
    [J]. MOLECULAR BIOLOGY OF THE CELL, 2001, 12 (10) : 2987 - 3003
  • [7] Genomic expression programs in the response of yeast cells to environmental changes
    Gasch, AP
    Spellman, PT
    Kao, CM
    Carmel-Harel, O
    Eisen, MB
    Storz, G
    Botstein, D
    Brown, PO
    [J]. MOLECULAR BIOLOGY OF THE CELL, 2000, 11 (12) : 4241 - 4257
  • [8] Gene expression profiling of the cellular transcriptional network regulated by alpha/beta interferon and its partial attenuation by the hepatitis C virus nonstructural 5A protein
    Geiss, GK
    Carter, VS
    He, YP
    Kwieciszewski, BK
    Holzman, T
    Korth, MJ
    Lazaro, CA
    Fausto, N
    Bumgarner, RE
    Katze, MG
    [J]. JOURNAL OF VIROLOGY, 2003, 77 (11) : 6367 - 6375
  • [9] Hartigan J. A., 1975, CLUSTERING ALGORITHM
  • [10] Functional discovery via a compendium of expression profiles
    Hughes, TR
    Marton, MJ
    Jones, AR
    Roberts, CJ
    Stoughton, R
    Armour, CD
    Bennett, HA
    Coffey, E
    Dai, HY
    He, YDD
    Kidd, MJ
    King, AM
    Meyer, MR
    Slade, D
    Lum, PY
    Stepaniants, SB
    Shoemaker, DD
    Gachotte, D
    Chakraburtty, K
    Simon, J
    Bard, M
    Friend, SH
    [J]. CELL, 2000, 102 (01) : 109 - 126