ChemDig: new approaches to chemically significant indexing and searching of distributed web collections

被引:6
作者
Gkoutos, GV [1 ]
Leach, C [1 ]
Rzepa, HS [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Dept Chem, London SW7 2AY, England
关键词
D O I
10.1039/b110693g
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We describe an extension of the ht://Dig robot-based internet indexing and search engine to include the retrieval of information included in a variety of molecular data formats as defined by chemical MIME types. This is achieved by invoking chemical meta-parsers, software agents designed to provide key meta-data information about the content of the external chemical files. This meta-data can include, for example, derived molecular formula, molecular mass and atom connection table ( SMILES) where the content of the le allows this, and other types of content such as author information and supplied keywords. These terms can be automatically added to the searchable terms, and the search outputs can be automatically linked via database requests to other external databases containing chemical information. We report our experience in applying this robot to indexing five different remote sites. We discuss different mechanisms for storing and searching for the chemical content, ranging from simple keyword-based searches qualified by chemically significant boolean terms, chemical similarity searches and our experiments in creating more highly structured content that expresses the chemical data using XML-based markup and where XSLT transforms for filtering, searching and rendering the information are used.
引用
收藏
页码:656 / 666
页数:11
相关论文
共 29 条
[21]  
MURRAYRUST P, 1997, WORLD WIDE WEB J, P135
[22]  
PATINY L, 2000, INT J CHEM, P3
[23]   A new publishing paradigm: STM articles as part of the semantic web [J].
Rzepa, HS ;
Murray-Rust, P .
LEARNED PUBLISHING, 2001, 14 (03) :177-182
[24]   The application of chemical Multipurpose Internet Mail Extensions (chemical MIME) Internet standards to electronic mail and World Wide Web information exchange [J].
Rzepa, HS ;
Murray-Rust, P ;
Whitaker, BJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06) :976-982
[25]   Recent advances in the CLiDE project: Logical layout analysis of chemical documents [J].
Simon, A ;
Johnson, AP .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :109-116
[26]  
SIMON R, 1993, J CHEM INF COMP SCI, V33, P338
[27]   Communication and communities of chemists [J].
Warr, WA .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (06) :966-975
[28]   The Dublin Core: A simple content description model for electronic resources [J].
Weibel, S .
BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1997, 24 (01) :9-11
[29]   SMILES .2. ALGORITHM FOR GENERATION OF UNIQUE SMILES NOTATION [J].
WEININGER, D ;
WEININGER, A ;
WEININGER, JL .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1989, 29 (02) :97-101