On the Choice and Number of Microarrays for Transcriptional Regulatory Network Inference

被引:11
作者
Cosgrove, Elissa J. [2 ,3 ]
Gardner, Timothy S. [2 ,4 ]
Kolaczyk, Eric D. [1 ]
机构
[1] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[2] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
[3] Amgen Inc, San Francisco, CA USA
[4] Amyris Biotechnol, Emeryville, CA USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
FALSE DISCOVERY RATE; SCALE; ASSOCIATIONS;
D O I
10.1186/1471-2105-11-454
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Transcriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF) gene interactions at the genome-wide level. In correlation-based TRNI, network edges can in principle be evaluated using standard statistical tests. However, while such tests nominally assume independent microarray experiments, we expect dependency between the experiments in microarray compendia, due to both project-specific factors (e. g., microarray preparation, environmental effects) in the multi-project compendium setting and effective dependency induced by gene-gene correlations. Herein, we characterize the nature of dependency in an Escherichia coli microarray compendium and explore its consequences on the problem of determining which and how many arrays to use in correlation-based TRNI. Results: We present evidence of substantial effective dependency among microarrays in this compendium, and characterize that dependency with respect to experimental condition factors. We then introduce a measure n(eff) of the effective number of experiments in a compendium, and find that corresponding to the dependency observed in this particular compendium there is a huge reduction in effective sample size i.e., n(eff) = 14.7 versus n = 376. Furthermore, we found that the neff of select subsets of experiments actually exceeded neff of the full compendium, suggesting that the adage 'less is more' applies here. Consistent with this latter result, we observed improved performance in TRNI using subsets of the data compared to results using the full compendium. We identified experimental condition factors that trend with changes in TRNI performance and neff, including growth phase and media type. Finally, using the set of known E. coli genetic regulatory interactions from RegulonDB, we demonstrated that false discovery rates (FDR) derived from n(eff)-adjusted p-values were well-matched to FDR based on the RegulonDB truth set. Conclusions: These results support utilization of neff as a potent descriptor of microarray compendia. In addition, they highlight a straightforward correlation-based method for TRNI with demonstrated meaningful statistical testing for significant edges, readily applicable to compendia from any species, even when a truth set is not available. This work facilitates a more refined approach to construction and utilization of mRNA expression compendia in TRNI.
引用
收藏
页数:16
相关论文
共 23 条
[1]  
Barrett T., 2009, NUCL ACIDS RES, V37
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]  
Butte A J, 2000, Pac Symp Biocomput, P418
[4]  
Butte AJ, 1999, J AM MED INFORM ASSN, P711
[5]   Analysis of variance components in gene expression data [J].
Chen, JJ ;
Delongchamp, RR ;
Tsai, CA ;
Hsueh, HM ;
Sistare, F ;
Thompson, KL ;
Desai, VG ;
Fuscoe, JC .
BIOINFORMATICS, 2004, 20 (09) :1436-1446
[6]  
COSGROVE EJ, 2010, APPL CHARACTERIZATIO
[7]   Discovery of meaningful associations in genomic data using partial correlation coefficients [J].
de la Fuente, A ;
Bing, N ;
Hoeschele, I ;
Mendes, P .
BIOINFORMATICS, 2004, 20 (18) :3565-3574
[8]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[9]   Correlation and large-scale simultaneous significance testing [J].
Efron, Bradley .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) :93-103
[10]   ARE A SET OF MICROARRAYS INDEPENDENT OF EACH OTHER? [J].
Efron, Bradley .
ANNALS OF APPLIED STATISTICS, 2009, 3 (03) :922-942