Using text to build semantic networks for pharmacogenomics

被引:72
作者
Coulet, Adrien [2 ,3 ]
Shah, Nigam H. [2 ]
Garten, Yael [4 ]
Musen, Mark [2 ]
Altman, Russ B. [1 ,2 ,3 ,4 ]
机构
[1] Stanford Univ, Dept Bioengn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Med, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[4] Stanford Univ, Stanford Biomed Informat, MSOB, Stanford, CA 94305 USA
关键词
Relationship extraction; Pharmacogenomics; Natural Language Processing; Ontology; Knowledge acquisition; Data integration; Biological network; Text mining; Information extraction;
D O I
10.1016/j.jbi.2010.08.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87 7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e g. VKORC1 polymorphism, warfarin response, clotting disorder treatment) The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery (C) 2010 Elsevier Inc All rights reserved
引用
收藏
页码:1009 / 1019
页数:11
相关论文
共 30 条
[1]  
Agichtein E., 2000, ACM 2000. Digital Libraries. Proceedings of the Fifth ACM Conference on Digital Libraries, P85, DOI 10.1145/336597.336644
[2]  
Ahlers CB, 2007, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007, P209
[3]  
[Anonymous], 2003, DESCRIPTION LOGIC HD
[4]  
[Anonymous], 2005, ONTOLOGY LEARNING TE
[5]  
Aussenac-Gilles N, 2005, APPL ONTOL, V1, P35
[6]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[7]  
Ciaramita M., 2005, IJCAI, P659
[8]  
CILIBRASI R, 2006, AUTOMATIC MEANING DI
[9]   Empirical distributional semantics: Methods and biomedical applications [J].
Cohen, Trevor ;
Widdows, Dominic .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (02) :390-405
[10]  
COULET A, 2010, P BIOONT SIG ISMB