Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles

被引:19
作者
Dai, Hong-Jie [1 ,2 ]
Lai, Po-Ting [3 ]
Tsai, Richard Tzong-Han [3 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30043, Taiwan
[2] Acad Sinica, Inst Informat Sci, Intelligent Agent Syst Lab, Taipei, Taiwan
[3] Yuan Ze Univ, Dept Comp Sci & Engn, Zhongli City 320, Taoyuan County, Taiwan
关键词
Data mining; feature evaluation and selection; mining methods and algorithms; text mining; scientific databases; BIOCREATIVE II; INFORMATION; PATTERNS; IDENTIFICATION; RECOGNITION; ABSTRACTS; TASK;
D O I
10.1109/TCBB.2010.45
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance ( AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.
引用
收藏
页码:412 / 420
页数:9
相关论文
共 40 条
[1]  
[Anonymous], 2001, P INT C MACH LEARN I
[2]  
BAUMGARTNER WA, 2007, P 2 BIOCREATIVE CHAL, P257
[3]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[4]   Linking entries in protein interaction database to structured text: The FEBS Letters experiment [J].
Ceol, Arnaud ;
Chatr-Aryamontri, Andrew ;
Licata, Luana ;
Cesareni, Gianni .
FEBS LETTERS, 2008, 582 (08) :1171-1177
[5]  
Dai H., 2007, P 2 BIOCREATIVE CHAL, P69
[6]  
EALES JM, 2008, P 16 ANN INT C INT S
[7]  
FUNDEL K, 2004, P BIOCREATIVE CHALL
[8]  
HAKENBERG J, 2007, P 2 BIOCREATIVE CHAL, P23
[9]   Overview of BioCreAtIvE task IB: normalized gene lists [J].
Hirschman, L ;
Colosimo, M ;
Morgan, A ;
Yeh, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[10]   Discovering patterns to extract protein-protein interactions from full texts [J].
Huang, ML ;
Zhu, XY ;
Hao, Y ;
Payan, DG ;
Qu, KB ;
Li, M .
BIOINFORMATICS, 2004, 20 (18) :3604-3612