Mining the Web to create specialized glossaries

被引:24
作者
Velardi, Paola [1 ]
Navigli, Roberto [1 ]
D'Amadio, Pierluigi
机构
[1] Univ Roma La Sapienza, Dept Comp Sci, I-00185 Rome, Italy
关键词
D O I
10.1109/MIS.2008.88
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A step in establishing a Web community's knowledge domain involves collecting a glossary of domain-relevant terms that constitute the linguistic surface manifestation of domain concepts. TermExtractor and GlossExtractor are two two Web-mining-based applications that support glossary building by exploiting the Web's evolving nature to allow continuous updating of an emerging community's vocabulary. These tools acquire a glossary's two basic components, such as terms and definitions where the terms are harvested from domain text corpora and the definitions are extracted from different types of Web pages.
引用
收藏
页码:18 / 25
页数:8
相关论文
共 13 条
[1]  
Androutsopoulos I., 2004, P 20 INT C COMP LING, P1360
[2]  
Androutsopoulos I., 2005, P HUM LANG TECHN C C, P323
[3]  
Bontas E. P., 2005, P 3 BERL XML TAG HUM, P153
[4]   Soft pattern matching models for definitional question answering [J].
Cui, Hang ;
Kan, Min-Yen ;
Chua, Tatseng .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2007, 25 (02)
[5]  
Fujii A, 2000, 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P488
[6]  
Hearst MA, 1992, P 14 INT C COMP LING, V2, P539, DOI DOI 10.3115/992133.992154
[7]  
Klavans JL, 2001, J AM MED INFORM ASSN, P324
[8]   Learning domain ontologies from document warehouses and dedicated web sites [J].
Navigli, R ;
Velardi, P .
COMPUTATIONAL LINGUISTICS, 2004, 30 (02) :151-179
[9]  
Ng HT, 2001, PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P67
[10]  
PARK Y, 2002, P 19 INT C COMP LING, P1, DOI DOI 10.3115/1072228.1072351