Systematic identification of statistically significant network measures

被引:17
作者
Ziv, E [1 ]
Koytcheff, R
Middendorf, M
Wiggins, C
机构
[1] Columbia Univ, Coll Phys & Surg, New York, NY 10027 USA
[2] Columbia Univ, Dept Biomed Engn, New York, NY 10027 USA
[3] Columbia Univ, Dept Appl Phys & Appl Math, New York, NY 10027 USA
[4] Columbia Univ, Dept Phys, New York, NY 10027 USA
[5] Columbia Univ, Ctr Computat Biol & Bioinformat, New York, NY 10027 USA
来源
PHYSICAL REVIEW E | 2005年 / 71卷 / 01期
关键词
D O I
10.1103/PhysRevE.71.016110
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
We present a graph embedding space (i.e., a set of measures on graphs) for performing statistical analyses of networks. Key improvements over existing approaches include discovery of "motif hubs" (multiple overlapping significant subgraphs), computational efficiency relative to subgraph census, and flexibility (the method is easily generalizable to weighted and signed graphs). The embedding space is based on scalars, functionals of the adjacency matrix representing the network. Scalars are global, involving all nodes; although they can be related to subgraph enumeration, there is not a one-to-one mapping between scalars and subgraphs. Improvements in network randomization and significance testing-we learn the distribution rather than assuming Gaussianity-are also presented. The resulting algorithm establishes a systematic approach to the identification of the most significant scalars and suggests machine-learning techniques for network classification.
引用
收藏
页数:8
相关论文
共 34 条
[1]  
[Anonymous], ACM P 3 ANN ACM S TH
[2]  
[Anonymous], 1994, SOCIAL NETWORK ANAL
[3]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[4]   ASYMPTOTIC NUMBER OF LABELED GRAPHS WITH GIVEN DEGREE SEQUENCES [J].
BENDER, EA ;
CANFIELD, ER .
JOURNAL OF COMBINATORIAL THEORY SERIES A, 1978, 24 (03) :296-307
[5]   Variation in gene expression patterns in follicular lymphoma and the response to rituximab [J].
Bohen, SP ;
Troyanskaya, OG ;
Alter, O ;
Warnke, R ;
Botstein, D ;
Brown, PO ;
Levy, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (04) :1926-1930
[6]   The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD):: comprehensive resources for the organization and comparison of model organism protein information [J].
Costanzo, MC ;
Hogan, JD ;
Cusick, ME ;
Davis, BP ;
Fancher, AM ;
Hodges, PE ;
Kondu, P ;
Lengieza, C ;
Lew-Smith, JE ;
Lingner, C ;
Roberg-Perez, KJ ;
Tillberg, M ;
Brooks, JE ;
Garrels, JI .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :73-76
[7]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[8]  
Davis J A., 1972, Sociological Theories in Progress, V54, P218
[9]   Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network -: art. no. 10 [J].
Dobrin, R ;
Beg, QK ;
Barabási, AL ;
Oltvai, ZN .
BMC BIOINFORMATICS, 2004, 5 (1)
[10]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5