Extracting and characterizing gene-drug relationships from the literature

被引:38
作者
Chang, JT [1 ]
Altman, RB [1 ]
机构
[1] Stanford Biomed Informat, Stanford Med Ctr, Dept Genet, Stanford, CA 94305 USA
来源
PHARMACOGENETICS | 2004年 / 14卷 / 09期
关键词
algorithms; databases; machine learning; natural language processing; pharmacogenetics;
D O I
10.1097/00008571-200409000-00002
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A fundamental task of pharmacogenetics is to collect and classify relationships between genes and drugs. Currently, this useful information has not been comprehensively aggregated in any database and remains scattered throughout the published literature. Although there are efforts to collect this information manually, they are limited by the size of the published literature on gene-drug relationships. Therefore, we investigated computational methods to extract and characterize pharmacogenetic relationships between genes and drugs from the literature. We first evaluated the effectiveness of the co-occurrence method in identifying related genes and drugs. We then used supervised machine learning algorithms to classify the relationships between genes and drugs from the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) into five categories that have been defined by active pharmacogenetic researchers as relevant to their work. The final co-occurrence algorithm was able to extract 78% of the related genes and drugs that were published in a review article from the literature. Our algorithm subsequently classified the relationships between genes and drugs from the PharmGKB into five categories with 74% accuracy. We have made the data available on a supplementary website at http://bionlp.stanford.edu/genedrug/ Gene-drug relationships can be accurately extracted from text and classified into categories. Although the relationships that we have identified do not capture the details and fine distinctions often made in the literature, these methods will help scientists to track the ever-growing literature and create information resources to support future discoveries. (C) 2004 Lippincott Williams Wilkins.
引用
收藏
页码:577 / 586
页数:10
相关论文
共 44 条
[1]   Genetic polymorphisms of alcohol metabolizing enzymes [J].
Agarwal, DP .
PATHOLOGIE BIOLOGIE, 2001, 49 (09) :703-709
[2]   Automated extraction of information in molecular biology [J].
Andrade, MA ;
Bork, P .
FEBS LETTERS, 2000, 476 (1-2) :12-17
[3]   MECHANISMS OF GLUCOCORTICOID INHIBITION OF GROWTH [J].
BAXTER, JD .
KIDNEY INTERNATIONAL, 1978, 14 (04) :330-333
[4]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[5]  
Bringuier PP, 1998, INT J CANCER, V79, P531, DOI 10.1002/(SICI)1097-0215(19981023)79:5<531::AID-IJC15>3.3.CO
[6]  
2-D
[7]   Creating an online dictionary of abbreviations from MEDLINE [J].
Chang, JT ;
Schütze, H ;
Altman, RB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2002, 9 (06) :612-620
[8]  
Chute CG, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P165
[9]   Equally potent inhibitors of cholesterol synthesis in human hepatocytes have distinguishable effects on different cytochrome P450 enzymes [J].
Cohen, LH ;
van Leeuwen, REW ;
van Thiel, GCF ;
van Pelt, JF ;
Yap, SH .
BIOPHARMACEUTICS & DRUG DISPOSITION, 2000, 21 (09) :353-364
[10]  
Craven M, 1999, Proc Int Conf Intell Syst Mol Biol, P77