Literature extraction of protein functions using sentence pattern mining

被引:15
作者
Chiang, JH [1 ]
Yu, HC [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
关键词
text mining; bioinformatics; knowledge acquisition; linguistic processing;
D O I
10.1109/TKDE.2005.132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of articles of genomics research, it has become a challenge for biomedical researchers to access this ever-increasing quantity of information to understand the newest discovery of functions of proteins they are studying. To facilitate functional annotation of proteins by utilizing the huge amounts of biomedical literature and transforming the knowledge into easily accessible database formats, the text mining technique thus becomes essential. In this paper, we propose the method of sentence pattern mining to extract protein functions from biomedical literature. To recognize variants of function terms correctly, we identify morphological, syntactic, and semantic variation forms. The proposed methods can be used to aid database curators in annotating protein functions and to assist biologists and medical researchers in searching protein functions from biomedical literature.
引用
收藏
页码:1088 / 1098
页数:11
相关论文
共 23 条
[1]  
[Anonymous], 2001, SPOTTING DISCOVERING
[2]   Swiss-Prot: Juggling between evolution and stability [J].
Bairoch, A ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E .
BRIEFINGS IN BIOINFORMATICS, 2004, 5 (01) :39-55
[3]  
Camon E., 2003, SILICO BIOL, V4, P5
[4]   GIS: a biomedical text-mining system for gene information discovery [J].
Chiang, JH ;
Yu, HC ;
Hsu, HJ .
BIOINFORMATICS, 2004, 20 (01) :120-121
[5]   MeKE: discovering the functions of gene products from biomedical literature via sentence alignment [J].
Chiang, JH ;
Yu, HC .
BIOINFORMATICS, 2003, 19 (11) :1417-1422
[6]   Extracting human protein interactions from MEDLINE using a full-sentence parser [J].
Daraselia, N ;
Yuryev, A ;
Egorov, S ;
Novichkova, S ;
Nikitin, A ;
Mazo, I .
BIOINFORMATICS, 2004, 20 (05) :604-U43
[7]   The Gene Ontology (GO) database and informatics resource [J].
Harris, MA ;
Clark, J ;
Ireland, A ;
Lomax, J ;
Ashburner, M ;
Foulger, R ;
Eilbeck, K ;
Lewis, S ;
Marshall, B ;
Mungall, C ;
Richter, J ;
Rubin, GM ;
Blake, JA ;
Bult, C ;
Dolan, M ;
Drabkin, H ;
Eppig, JT ;
Hill, DP ;
Ni, L ;
Ringwald, M ;
Balakrishnan, R ;
Cherry, JM ;
Christie, KR ;
Costanzo, MC ;
Dwight, SS ;
Engel, S ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Nash, RS ;
Sethuraman, A ;
Theesfeld, CL ;
Botstein, D ;
Dolinski, K ;
Feierbach, B ;
Berardini, T ;
Mundodi, S ;
Rhee, SY ;
Apweiler, R ;
Barrell, D ;
Camon, E ;
Dimmer, E ;
Lee, V ;
Chisholm, R ;
Gaudet, P ;
Kibbe, W ;
Kishore, R ;
Schwarz, EM ;
Sternberg, P ;
Gwinn, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D258-D261
[8]  
HERSH W, 2003, P 12 TEXT RETR C TRE
[9]   Accomplishments and challenges in literature data mining for biology [J].
Hirschman, L ;
Park, JC ;
Tsujii, J ;
Wong, L ;
Wu, CH .
BIOINFORMATICS, 2002, 18 (12) :1553-1561
[10]  
LEROY G, 2002, PACIFIC S BIOCOMPUTI, V7, P350