AUTOMATIC INTERPRETATION OF THE TEXTS OF CHEMICAL PATENT ABSTRACTS .1. LEXICAL ANALYSIS AND CATEGORIZATION

被引:12
作者
CHOWDHURY, GG [1 ]
LYNCH, MF [1 ]
机构
[1] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
来源
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES | 1992年 / 32卷 / 05期
关键词
D O I
10.1021/ci00009a011
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A semiautomatic method for converting to GENSAL those parts of Derwent Publications Ltd. Documentation Abstracts which specify generic structures is reported in this paper and that which follows. Techniques of natural language processing (NLP) applied in a prototype system are discussed. This paper deals with the lexical isolation and categorization of tokens from the generic structure textual descriptions. Templates for processing of both the variable and multiplier expressions, which predominate, have been identified; they provide the basis for further analysis. Rules for the isolation of tokens are discussed and illustrated. Some categories of tokens are identified by morphological analysis, while others are dealt with by dictionary lookup. The output is a list of tokens along with a number of associated semantic features which help at the processing stage discussed in the following paper.
引用
收藏
页码:463 / 467
页数:5
相关论文
共 30 条
[1]   EXTRACTION OF CHEMICAL-REACTION INFORMATION FROM PRIMARY JOURNAL TEXT [J].
AI, CS ;
BLOWER, PE ;
LEDWITH, RH .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1990, 30 (02) :163-169
[2]  
BERNARD JM, 1991, J CHEM INF COMP SCI, V31, P64
[3]  
Chowdhury G., 1991, Intelligent Text and Image Handling. Proceedings of a Conference. RIAO '91, P740
[4]   A COMPARISON OF 3 ONLINE MARKUSH DATABASES [J].
CLOUTIER, KA .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1991, 31 (01) :40-44
[5]   COMPUTER TRANSLATION OF IUPAC SYSTEMATIC ORGANIC CHEMICAL NOMENCLATURE .1. INTRODUCTION AND BACKGROUND TO A GRAMMAR-BASED APPROACH [J].
COOKEFOX, DI ;
KIRBY, GH ;
RAYNER, JD .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1989, 29 (02) :101-105
[6]   COMPUTER TRANSLATION OF IUPAC SYSTEMATIC ORGANIC-CHEMICAL NOMENCLATURE .4. CONCISE CONNECTION TABLES TO STRUCTURE DIAGRAMS [J].
COOKEFOX, DI ;
KIRBY, GH ;
LORD, MR ;
RAYNER, JD .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1990, 30 (02) :122-127
[7]  
DETHLEFSEN W, 1991, J CHEM INF COMP SCI, V31, P260
[8]   NATURAL-LANGUAGE PROCESSING IN INFORMATION-RETRIEVAL [J].
DOSZKOCS, TE .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1986, 37 (04) :191-196
[9]  
DOSZKOCS TE, 1986, 1986 P ACM C RES DEV, P49
[10]  
EVERS H, 1989, PERSPECTIVES INFORMA, P219