Detecting Novel Associations in Large Data Sets

被引:2476
作者
Reshef, David N. [1 ,2 ,3 ]
Reshef, Yakir A. [2 ,4 ]
Finucane, Hilary K. [5 ]
Grossman, Sharon R. [2 ,6 ]
McVean, Gilean [3 ,7 ]
Turnbaugh, Peter J. [6 ]
Lander, Eric S. [2 ,8 ,9 ]
Mitzenmacher, Michael [10 ]
Sabeti, Pardis C. [2 ,6 ]
机构
[1] MIT, Dept Comp Sci, Cambridge, MA 02139 USA
[2] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[3] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
[4] Harvard Univ, Dept Math, Cambridge, MA 02138 USA
[5] Weizmann Inst Sci, Dept Comp Sci & Appl Math, IL-76100 Rehovot, Israel
[6] Harvard Univ, Dept Organism & Evolutionary Biol, Ctr Syst Biol, Cambridge, MA 02138 USA
[7] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[8] MIT, Dept Biol, Cambridge, MA 02139 USA
[9] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02115 USA
[10] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
基金
美国国家科学基金会; 欧洲研究理事会;
关键词
PRINCIPAL CURVES; REGRESSION; INFORMATION; MICROBIOME; HEALTH; CYCLE;
D O I
10.1126/science.1205438
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R-2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
引用
收藏
页码:1518 / 1524
页数:7
相关论文
共 33 条
  • [1] Robust detection of periodic time series measured from biological systems -: art. no. 117
    Ahdesmäki, M
    Lähdesmäki, H
    Pearson, R
    Huttunen, H
    Yli-Harja, O
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [2] Mo-total organic carbon covariation in modern anoxic marine environments: Implications for analysis of paleoredox and paleohydrographic conditions
    Algeo, TJ
    Lyons, TW
    [J]. PALEOCEANOGRAPHY, 2006, 21 (01):
  • [3] [Anonymous], 2011, SCIENCE, DOI DOI 10.1126/SCIENCE.331.6018.692
  • [4] [Anonymous], 2009, WORLD FACTB 2009
  • [5] Baseball Prospectus Statistics Reports, 2009, BAS PROSP STAT REP
  • [6] BREIMAN L, 1985, J AM STAT ASSOC, V80, P580, DOI 10.2307/2288473
  • [7] Influence of life stress on depression: Moderation by a polymorphism in the 5-HTT gene
    Caspi, A
    Sugden, K
    Moffitt, TE
    Taylor, A
    Craig, IW
    Harrington, H
    McClay, J
    Mill, J
    Martin, J
    Braithwaite, A
    Poulton, R
    [J]. SCIENCE, 2003, 301 (5631) : 386 - 389
  • [8] Human resources for health: overcoming the crisis
    Chen, L
    Evans, T
    Anand, S
    Boufford, JI
    Brown, H
    Chowdhury, M
    Cueto, M
    Dare, L
    Dussault, G
    Elzinga, G
    Fee, E
    Habte, D
    Hanvoravongchai, P
    Jacobs, M
    Kurowski, C
    Michael, S
    Pablos-Mendez, A
    Sewankambo, N
    Solimano, G
    Stilwell, B
    de Waal, A
    Wibulpolprasert, S
    [J]. LANCET, 2004, 364 (9449) : 1984 - 1990
  • [9] Oxygen isotope studies of achondrites
    Clayton, RN
    Mayeda, TK
    [J]. GEOCHIMICA ET COSMOCHIMICA ACTA, 1996, 60 (11) : 1999 - 2017
  • [10] LOCALLY WEIGHTED REGRESSION - AN APPROACH TO REGRESSION-ANALYSIS BY LOCAL FITTING
    CLEVELAND, WS
    DEVLIN, SJ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (403) : 596 - 610