Using genre-specific features for patent summaries

被引:16
作者
Codina-Filba, Joan [1 ]
Bouayad-Agha, Nadjet [1 ]
Burga, Alicia [1 ]
Casamayor, Gerard [1 ]
Mille, Simon [1 ]
Mueller, Andreas [2 ]
Saggion, Horacio [1 ]
Wanner, Leo [1 ,3 ]
机构
[1] Pompeu Fabra Univ, Dept Commun & Informat Technol, Nat Language Proc Grp, Barcelona, Spain
[2] Univ Stuttgart, Inst Nat Language Proc, Stuttgart, Germany
[3] Catalan Inst Res & Adv Studies ICREA, Barcelona, Spain
关键词
Summarization; Patents; Lexical chains; Segmentation; Segment-based summarization; Sentence aggregation;
D O I
10.1016/j.ipm.2016.07.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Patent search is recall-driven, which goes hand in hand with at least a partial sacrifice of precision. As a consequence, patent analysts have to regularly view and examine a large amount of patents. This implies a very high workload. Interactive analysis aids that help to minimize this workload are thus of high demand. Still, these aids do not reduce the amount of the material to be examined, they only facilitate its examination. Its reduction can be achieved working with patent summaries instead of full patent documents. So far, high quality patent summaries are produced mainly manually and only a few research works address the problem of automatic patent summarization. Most often, these works either replicate the summarization metrics known from general discourse summarization or focus on the claims of a patent. However, it can be observed that neither of the strategies is adequate: general discourse state-of-the-art summarization techniques are of limited use due to the idiosyncrasies of the patent genre, and techniques that focus on claims only miss in their summaries important details provided in the other sections on the components of the invention introduced in the claims. We propose a patent summarization technique that takes the idiosyncrasies of the patent genre (such as the unbalanced distribution of the content across the different sections of a patent, excessive length of the sentences in the claims, abstract vocabulary, etc.) into account to obtain a comprehensive summary of the invention. In particular, we make use of lexical chains in the claims and in the description of the invention and of aligned claim-description segments at the sub-sentential level to assess the relevance of the individual fragments of the document for the summary. The most relevant fragments are selected and merged using full-fledged natural language generation techniques. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:151 / 174
页数:24
相关论文
共 58 条
[1]
A literature review on the state-of-the-art in patent analysis [J].
Abbas, Assad ;
Zhang, Limin ;
Khan, Samee U. .
WORLD PATENT INFORMATION, 2014, 37 :3-13
[2]
Abu-Jbara A., 2011, P 49 ANN M ASS COMP, V1, P500
[3]
[Anonymous], 2008, P 22 INT C COMP LING, DOI DOI 10.3115/1599081.1599168
[4]
[Anonymous], P INT C LANG RES EV
[5]
[Anonymous], 2011, Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
[6]
[Anonymous], 1997, 5 C APPL NAT LANG PR, DOI DOI 10.3115/974557.974599
[7]
[Anonymous], 2010, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10
[8]
Aone C, 1999, ADVANCES IN AUTOMATIC TEXT SUMMARIZATION, P71
[9]
Azzam S., 1999, P ACL 99 WORKSH COR
[10]
BARZILAY R, 1999, ADV AUTOMATIC TEXT S