Understanding and customizing stopword lists for enhanced patent mapping

被引:34
作者
Blanchard, Antoine [1 ]
机构
[1] Syngenta Crop Protect AG, Intellectual Property Dept, Schwarzwaldallee 215, CH-4002 Basel, Switzerland
关键词
Text mining; Word distribution; Zipf's law; STN AnaVist; Thomson Aureka; OmniViz; Stopwords; Patent mapping;
D O I
10.1016/j.wpi.2007.02.002
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
While the use of patent mapping tools is growing, the 'black-box' systems involved do not generally allow the user to interfere further than the preliminary retrieval of documents. Except, that is, for one thing: the stopword list, i.e. the list of 'noise' words to be ignored, which can be modified to one's liking and dramatically impacts the final output and analysis. This paper invokes information science and computer science to provide clues for a better understanding of the stopword lists' origin and purpose, and how they fit in the mapping algorithm. Further, it stresses the need for stopword lists that depend on the document corpus analyzed. Thus, the analyst is invited to add and remove stopwords-or even, in order to avoid inherent biases, to use algorithms that can automatically create ad hoc stopword lists. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:308 / 316
页数:9
相关论文
共 27 条
  • [1] Al-Halimi RK, 2003, CS200336 U WAT
  • [2] Belew RK, 2000, FINDING OUT SEARCH E
  • [3] Boulakia C., 2001, BOULAKIA
  • [4] Church K.W., 1995, P 3 WORKSH VER LARG, P121, DOI DOI 10.1007/978-94-017-2390-9_18
  • [5] Fattori M., 2003, WORLD PATENT INF, V25, P335, DOI DOI 10.1016/S0172-2190(03)00113-3
  • [6] Fischer G, ANAL VISUALISATION H
  • [7] ON INFORMATION AND SUFFICIENCY
    KULLBACK, S
    LEIBLER, RA
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (01): : 79 - 86
  • [8] Li W., 1997, BIBLIO ZIPFS LAW
  • [9] THE AUTOMATIC CREATION OF LITERATURE ABSTRACTS
    LUHN, HP
    [J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1958, 2 (02) : 159 - 165
  • [10] Mendelsohn S., 2000, INFORMAT WORLD REV