REGULAR EXPRESSIONS INTO FINITE AUTOMATA

被引:161
作者
BRUGGEMANNKLEIN, A
机构
[1] Institut für Informatik, Universität Freiburg, 7800 Freiburg
关键词
D O I
10.1016/0304-3975(93)90287-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is a well-established fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without epsilon-transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi (1986) have shown that the construction of an epsilon-free NFA due to Glushkov (1961) is a natural representation of the regular expression because it can be described in terms of the Brzozowski derivatives (Brzozowski 1964) of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard (ISO 8879 1986), now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in a time quadratic in the size of the expression, and that this is worst-case optimal. For deterministic expressions, our algorithm has even linear run time. This improves on the cubic time methods suggested in the literature (Book et al. 1971; Aho et al. 1986; Berry and Sethi 1986). A major step of the algorithm consists in bringing the expression into what we call star normal form. This concept is also useful for characterizing the relationship between two types of unambiguity that have been studied in the literature. Namely, we show that, modulo a technical condition, an expression is strongly unambiguous (Sippu and Soisalon-Soininen 1988) if and only if it is weakly unambiguous (Book et al. 1971) and in star-normal form. This leads to our third result, a quadratic-time decision algorithm for weak unambiguity, that improves on the biquadratic method introduced by Book et al. (1971).
引用
收藏
页码:197 / 213
页数:17
相关论文
共 16 条
[1]  
Aho Alfred V., 1986, ADDISON WESLEY SERIE
[2]  
ALBERT J, 1983, AUTOMATEN SPRACHEN M
[3]  
[Anonymous], AWK PROGRAMMING LANG
[4]   FROM REGULAR EXPRESSIONS TO DETERMINISTIC AUTOMATA [J].
BERRY, G ;
SETHI, R .
THEORETICAL COMPUTER SCIENCE, 1986, 48 (01) :117-126
[5]  
BRUGGEMANNKLEIN A, 1992, LECT NOTES COMPUT SC, V577, P173
[6]   DERIVATIVES OF REGULAR EXPRESSIONS [J].
BRZOZOWSKI, JA .
JOURNAL OF THE ACM, 1964, 11 (04) :481-&
[7]  
CHEN CH, 1992, IN PRESS 3RD S COMB
[8]  
DOUGHETY D, 1987, UNIX TEXT PROCESSING
[9]  
Glushkov V. M., 1961, RUSS MATH SURV, V16, P1
[10]  
HENNIE FC, 1968, FINITE STATE MODELS