Equitability, mutual information, and the maximal information coefficient

被引:473
作者
Kinney, Justin B. [1 ]
Atwal, Gurinder S. [1 ]
机构
[1] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, Cold Spring Harbor, NY 11724 USA
关键词
DISCOVERY; BIAS;
D O I
10.1073/pnas.1309933111
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062): 1518-1524], which proposed an alternative definition of equitability and introduced a new statistic, the "maximal information coefficient" (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.
引用
收藏
页码:3354 / 3359
页数:6
相关论文
共 29 条
[1]   minerva and minepy: a C engine for the MINE suite and its R, Python']Python and MATLAB wrappers [J].
Albanese, Davide ;
Filosi, Michele ;
Visintainer, Roberto ;
Riccadonna, Samantha ;
Jurman, Giuseppe ;
Furlanello, Cesare .
BIOINFORMATICS, 2013, 29 (03) :407-408
[2]  
[Anonymous], 2012, NAT BIOTECHNOL, V30, P334
[3]  
[Anonymous], SCIENCE 1216
[4]  
Cover T.M., 2006, ELEMENTS INFORM THEO, V2nd ed
[5]   A universal framework for regulatory element discovery across all Genomes and data types [J].
Elemento, Olivier ;
Slonim, Noam ;
Tavazoie, Saeed .
MOLECULAR CELL, 2007, 28 (02) :337-350
[6]   Systematic discovery of structural elements governing stability of mammalian messenger RNAs [J].
Goodarzi, Hani ;
Najafabadi, Hamed S. ;
Oikonomou, Panos ;
Greco, Todd M. ;
Fish, Lisa ;
Salavati, Reza ;
Cristea, Ileana M. ;
Tavazoie, Saeed .
NATURE, 2012, 485 (7397) :264-U160
[7]   A NON-PARAMETRIC TEST OF INDEPENDENCE [J].
HOEFFDING, W .
ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (04) :546-557
[8]   Independent component analysis:: algorithms and applications [J].
Hyvärinen, A ;
Oja, E .
NEURAL NETWORKS, 2000, 13 (4-5) :411-430
[9]   Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data [J].
Khan, Shiraj ;
Bandyopadhyay, Sharba ;
Ganguly, Auroop R. ;
Saigal, Sunil ;
Erickson, David J., III ;
Protopopescu, Vladimir ;
Ostrouchov, George .
PHYSICAL REVIEW E, 2007, 76 (02)
[10]  
Kinney JB, 2013, NEURAL COMPUT, DOI [10.1162/NECO_a_00568, DOI 10.1162/NEC0_]