Automated categorization of German-language patent documents

被引:33
作者
Fall, CJ
Törcsvári, A
Fiévet, P
Karetka, G
机构
[1] ELCA, Edoc Div, CH-1000 Lausanne 12, Switzerland
[2] Arcanum Dev, H-1117 Budapest, Hungary
[3] World Intellectual Property Org, CH-1211 Geneva 20, Switzerland
关键词
patent; intellectual property; categorization; hierarchical taxonomy; International Patent Classification;
D O I
10.1016/S0957-4174(03)00141-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The categorization of patent documents is a difficult task that we study how to automate most accurately. We report the results of applying a variety of machine learning algorithms for training expert systems in German-language patent classification tasks. The taxonomy employed is the International Patent Classification, a complex hierarchical classification scheme in which we make use of 115 classes and 367 subclasses. The system is designed to handle natural language input in the form of the full text of a patent application. The effect on the categorization precision of indexing either the patent claims or the patent descriptions is reported. We describe several ways of measuring the categorization success that account for the attribution of multiple classification codes to each patent document. We show how the hierarchical information inherent in the taxonomy can be used to improve automated categorization precision. Our results are compared to an earlier study of automated English-language patent categorization. (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:269 / 277
页数:9
相关论文
共 26 条
  • [1] Adams S., 2000, World Patent Information, V22, P291, DOI 10.1016/S0172-2190(00)00073-9
  • [2] [Anonymous], 1996, BOW TOOLKIT STAT LAN
  • [3] Calvert J., 2001, World Patent Information, V23, P133, DOI 10.1016/S0172-2190(01)00006-0
  • [4] CARPENTER AM, 1978, INT CLASSIF, V5, P30
  • [5] Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies
    Chakrabarti, S
    Dom, B
    Agrawal, R
    Raghavan, P
    [J]. VLDB JOURNAL, 1998, 7 (03) : 163 - 178
  • [6] CHAKRABARTI S, 1998, ENHANCED HYPERTEXT C, P307
  • [7] CHAKRABARTI S, 1997, USING TAXONOMY DISCR, P446
  • [8] FALL CJ, 2003, ACM SI IR FOR 37
  • [9] Gey F., 2001, P 1 INT C HUM LANG T, P91
  • [10] Hull D., 2001, World Patent Information, V23, P265, DOI 10.1016/S0172-2190(01)00024-2