Gene mention normalization and interaction extraction with context models and sentence motifs

被引:34
作者
Hakenberg, Joerg [1 ,2 ]
Plake, Conrad [1 ,3 ]
Royer, Loic [1 ]
Strobelt, Hendrik [1 ]
Leser, Ulf [2 ]
Schroeder, Michael [1 ]
机构
[1] Tech Univ Dresden, Ctr Biotechnol, Bioinformat Grp, D-01307 Dresden, Germany
[2] Humboldt Univ, Dept Comp Sci, D-10099 Berlin, Germany
[3] Transinsight GmbH, D-01307 Dresden, Germany
来源
GENOME BIOLOGY | 2008年 / 9卷
关键词
D O I
10.1186/gb-2008-9-S2-S14
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization-to identify biomedical objects in text-and extraction of qualified relationships between those objects. Results: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4%(micro-average) in the BioCreative II interaction pair subtask. Conclusion: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. Availability: Our methods for gene, protein, and species identification, and extraction of protein-protein interactions are available as part of the BioCreative Meta Services (BCMS), see http://bcms.bioinfo.cnio.es/.
引用
收藏
页数:16
相关论文
共 36 条
[11]   Proteome survey reveals modularity of the yeast cell machinery [J].
Gavin, AC ;
Aloy, P ;
Grandi, P ;
Krause, R ;
Boesche, M ;
Marzioch, M ;
Rau, C ;
Jensen, LJ ;
Bastuck, S ;
Dümpelfeld, B ;
Edelmann, A ;
Heurtier, MA ;
Hoffman, V ;
Hoefert, C ;
Klein, K ;
Hudak, M ;
Michon, AM ;
Schelder, M ;
Schirle, M ;
Remor, M ;
Rudi, T ;
Hooper, S ;
Bauer, A ;
Bouwmeester, T ;
Casari, G ;
Drewes, G ;
Neubauer, G ;
Rick, JM ;
Kuster, B ;
Bork, P ;
Russell, RB ;
Superti-Furga, G .
NATURE, 2006, 440 (7084) :631-636
[12]  
HAKENBERG J, 2006, INT S SEM MIN BIOM J
[13]  
HAKENBERG J, 2007, P BIONLP 2007 BIOL T
[14]  
HAKENBERG J, 2005, LEARN LANG LOG WORKS
[15]   Discovering patterns to extract protein-protein interactions from the literature: Part II [J].
Hao, Y ;
Zhu, XY ;
Huang, ML ;
Li, M .
BIOINFORMATICS, 2005, 21 (15) :3294-3300
[16]   IntAct: an open source molecular interaction database [J].
Hermjakob, H ;
Montecchi-Palazzi, L ;
Lewington, C ;
Mudali, S ;
Kerrien, S ;
Orchard, S ;
Vingron, M ;
Roechert, B ;
Roepstorff, P ;
Valencia, A ;
Margalit, H ;
Armstrong, J ;
Bairoch, A ;
Cesareni, G ;
Sherman, D ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D452-D455
[17]   Overview of BioCreAtIvE: critical assessment of information extraction for biology [J].
Hirschman, L ;
Yeh, A ;
Blaschke, C ;
Valencia, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[18]   Karyopherin-mediated import of integral inner nuclear membrane proteins [J].
King, Megan C. ;
Lusk, C. Patrick ;
Blobel, Guenter .
NATURE, 2006, 442 (7106) :1003-1007
[19]   Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge [J].
Krallinger, Martin ;
Morgan, Alexander ;
Smith, Larry ;
Leitner, Florian ;
Tanabe, Lorraine ;
Wilbur, John ;
Hirschman, Lynette ;
Valencia, Alfonso .
GENOME BIOLOGY, 2008, 9
[20]   A human phenome-interactome network of protein complexes implicated in genetic disorders [J].
Lage, Kasper ;
Karlberg, E. Olof ;
Storling, Zenia M. ;
Olason, Pall I. ;
Pedersen, Anders G. ;
Rigina, Olga ;
Hinsby, Anders M. ;
Tumer, Zeynep ;
Pociot, Flemming ;
Tommerup, Niels ;
Moreau, Yves ;
Brunak, Soren .
NATURE BIOTECHNOLOGY, 2007, 25 (03) :309-316