Mining chemical information from open patents

被引:24
作者
Jessop, David M. [1 ]
Adams, Sam E. [1 ]
Murray-Rust, Peter [1 ]
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
基金
英国工程与自然科学研究理事会;
关键词
WORLD-WIDE-WEB; COMPUTATIONAL-LINGUISTICS TECHNIQUES; MARKUP; XML; EXTRACTION;
D O I
10.1186/1758-2946-3-40
中图分类号
O6 [化学];
学科分类号
070301 [无机化学];
摘要
Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.
引用
收藏
页数:17
相关论文
共 44 条
[1]
EXTRACTION OF CHEMICAL-REACTION INFORMATION FROM PRIMARY JOURNAL TEXT [J].
AI, CS ;
BLOWER, PE ;
LEDWITH, RH .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1990, 30 (02) :163-169
[2]
[Anonymous], CAS DATABASES CAPLUS
[3]
[Anonymous], USPTO BULK DOWNLOADS
[4]
[Anonymous], JUMBO6
[5]
[Anonymous], OPSIN OPEN PARSER SY
[6]
[Anonymous], CAS REGISTRY GOLD ST
[7]
[Anonymous], EBD ST 36 XML DATA I
[8]
[Anonymous], INFOCHEM CHEMPROSPEC
[9]
[Anonymous], CHEM ADD IN WORD
[10]
[Anonymous], OSRA OPTICAL STRUCTU