NEW METHODS IN AUTOMATIC EXTRACTING

被引:668
作者
EDMUNDSON, HP
机构
[1] University of Maryland, Computer Science Center, College Park, Maryland
关键词
D O I
10.1145/321510.321519
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes new methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document. While previous work has focused on one component of sentence significance, namely, the presence of high-frequency content words (key words), the methods described here also treat three additional components: pragmatic words (cue words); title and heading words; and structural indicators (sentence location).The research has resulted in an operating system and a research methodology. The extracting system is parameterized to control and vary the influence of the above four components. The research methodology includes procedures for the compilation of the required dictionaries, the setting of the control parameters, and the comparative evaluation of the automatic extracts with manually produced extracts. The results indicate that the three newly proposed components dominate the frequency component in the production of better extracts. © 1969, ACM. All rights reserved.
引用
收藏
页码:264 / +
页数:1
相关论文
共 7 条
[1]   AUTOMATIC ABSTRACTING AND INDEXING - SURVEY AND RECOMMENDATIONS [J].
EDMUNDSON, HP ;
WYLLYS, RE .
COMMUNICATIONS OF THE ACM, 1961, 4 (05) :226-234
[2]   PROBLEMS IN AUTOMATIC ABSTRACTING [J].
EDMUNDSON, HP .
COMMUNICATIONS OF THE ACM, 1964, 7 (04) :259-263
[3]  
KUHNS JL, 1962, 1 C INF SYST SCIENC
[4]   THE AUTOMATIC CREATION OF LITERATURE ABSTRACTS [J].
LUHN, HP .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1958, 2 (02) :159-165
[5]   COMPARISONS OF 4 TYPES OF LEXICAL INDICATORS OF CONTENT [J].
RATH, GJ ;
SAVAGE, TR ;
RESNICK, A .
AMERICAN DOCUMENTATION, 1961, 12 (02) :126-&
[6]  
1961, C1071U12 THOMPS INC
[7]  
1963, RADCTDR6393 TRW COMP