Reliable detection of episodes in event sequences

被引:15
作者
Gwadera, R [1 ]
Atallah, M [1 ]
Szpankowski, W [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
来源
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2003年
关键词
D O I
10.1109/ICDM.2003.1250904
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Suppose one wants to detect "bad" or "suspicious" subsequences in event sequences. Whether an observed pattern of activity (in the form of a particular subsequence) is significant and should be a cause for alarm, depends on how likely it is to occur fortuitously. A long enough sequence of observed events will almost certainly contain any subsequence, and setting thresholds for alarm is an important is site in a monitoring system that seeks to avoid false alarms. Suppose a long sequence T of observed events contains a suspicious subsequence pattern S within it, where the suspicious subsequence S consists of in events and spans a window of size omega within T. We address the fundamental problem: is a certain number of occurrences of a particular subsequence unlikely to be fortuitous (i.e., indicative of suspicious activity) ? If the probability of fortuitous occurrences is high and an automated monitoring system flags it as suspicious anyway, then such a system will suffer from generating too many false alarms. This paper quantifies the probability of such an S occurring in T within a window of size w, the number of distinct windows containing S as a subsequence, the expected number of such occurrences, its variance, and establishes its limiting distribution that allows to set tip an alarm threshold so that the probability of false alarms is very small. We report on experiments confirming the theory and showing that we can detect bad subsequences with low false alarm rate.
引用
收藏
页码:67 / 74
页数:8
相关论文
共 20 条
[11]  
Kumar Sandeep., 1994, P 17 NATL COMPUTER S, P11
[12]   Levelwise search and borders of theories in knowledge discovery [J].
Mannila, H ;
Toivonen, H .
DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1 (03) :241-258
[13]  
Nicodème P, 1999, LECT NOTES COMPUT SC, V1643, P194
[14]  
Pevzner P., 2000, COMPUTATIONAL MOL BI
[15]   On pattern frequency occurrences in a Markovian sequence [J].
Regnier, M ;
Szpankowski, W .
ALGORITHMICA, 1998, 22 (04) :631-649
[16]   The Emergence of Pattern Discovery Techniques in Computational Biology [J].
Rigoutsos, Isidore ;
Floratos, Aris ;
Panda, Laxmi ;
Gao, Yuan ;
Platt, Daniel .
METABOLIC ENGINEERING, 2000, 2 (03) :159-177
[17]  
Szpankowski W., 2001, AVERAGE CASE ANAL AL
[18]  
WATERMAN MS, 1995, INTRO COMPUTATIONAL
[19]  
Wespi A., 2000, Journal of Computer Security, V8, P159
[20]   FAST TEXT SEARCHING ALLOWING ERRORS [J].
WU, S ;
MANBER, U .
COMMUNICATIONS OF THE ACM, 1992, 35 (10) :83-91