Multi-way set enumeration in weight tensors

被引:12
作者
Georgii, Elisabeth [1 ,2 ]
Tsuda, Koji [3 ,4 ]
Schoelkopf, Bernhard [1 ]
机构
[1] Max Planck Inst Biol Cybernet, Dept Empir Inference, Tubingen, Germany
[2] Max Planck Gesell, Friedrich Miescher Lab, Tubingen, Germany
[3] Natl Inst Adv Ind Sci & Technol, Computat Biol Res Ctr, Tokyo, Japan
[4] Japan Sci & Technol Agcy, ERATO Minato Project, Tokyo, Japan
基金
日本学术振兴会;
关键词
Tensor; Multi-way set; Dense pattern enumeration; Quasi-hyper-clique; N-ary relation; Graph mining; COMMUNITY STRUCTURE; FUNCTIONAL MODULES; NETWORKS; COMPUTATION; ALGORITHMS; COMPLEXES; SEARCH;
D O I
10.1007/s10994-010-5210-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The analysis of n-ary relations receives attention in many different fields, for instance biology, web mining, and social studies. In the basic setting, there are n sets of instances, and each observation associates n instances, one from each set. A common approach to explore these n-way data is the search for n-set patterns, the n-way equivalent of itemsets. More precisely, an n-set pattern consists of specific subsets of the n instance sets such that all possible associations between the corresponding instances are observed in the data. In contrast, traditional itemset mining approaches consider only two-way data, namely items versus transactions. The n-set patterns provide a higher-level view of the data, revealing associative relationships between groups of instances. Here, we generalize this approach in two respects. First, we tolerate missing observations to a certain degree, that means we are also interested in n-sets where most (although not all) of the possible associations have been recorded in the data. Second, we take association weights into account. In fact, we propose a method to enumerate all n-sets that satisfy a minimum threshold with respect to the average association weight. Technically, we solve the enumeration task using a reverse search strategy, which allows for effective pruning of the search space. In addition, our algorithm provides a ranking of the solutions and can consider further constraints. We show experimental results on artificial and real-world datasets from different domains.
引用
收藏
页码:123 / 155
页数:33
相关论文
共 52 条
[1]   Multiway analysis of epilepsy tensors [J].
Acar, Evrim ;
Aykut-Bingol, Canan ;
Bingol, Haluk ;
Bro, Rasmus ;
Yener, Buelent .
BIOINFORMATICS, 2007, 23 (13) :I10-I18
[2]  
Acar E, 2006, LECT NOTES COMPUT SC, V3975, P213
[3]  
Agrawal R., 1994, VLDB 1994, P487
[4]  
[Anonymous], 2007, SAND20076702 SAND NA
[5]  
[Anonymous], SDM 08
[6]  
[Anonymous], 2006, AAAI
[7]  
[Anonymous], 2005, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, DOI DOI 10.1145/1081870.1081908
[8]  
[Anonymous], 2006, P 32 INT C VERY LARG
[9]  
[Anonymous], 2006, KDD
[10]   Greedily finding a dense subgraph [J].
Asahiro, Y ;
Iwama, K ;
Tamaki, H ;
Tokuyama, T .
JOURNAL OF ALGORITHMS, 2000, 34 (02) :203-221