Automatic extraction of reference gene from literature in plants based on texting mining

被引:3
作者
He Lin [1 ]
Shen Gengyu [2 ]
Li Fei [3 ]
Huang Shuiqing [1 ]
机构
[1] Nanjing Agr Univ, Dept Informat Management, Nanjing 210095, Jiangsu, Peoples R China
[2] Nanjing Agr Univ, Lib, Nanjing 210095, Jiangsu, Peoples R China
[3] Nanjing Agr Univ, Dept Entomol, Nanjing 210095, Jiangsu, Peoples R China
关键词
biological knowledge discovery; machine learning; NLP; reference gene; text mining; real-time quantitative polymerase chain reaction; bioinformatics; POLYMERASE-CHAIN-REACTION; BIOMEDICAL LITERATURE; EXPRESSION; SELECTION; NORMALIZATION; SYSTEM; TOOL;
D O I
10.1504/IJDMB.2015.070063
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments.
引用
收藏
页码:400 / 416
页数:17
相关论文
共 35 条
[21]  
Marneffe M.-C., P 5 INT C LANG RES E, P449
[22]   Feature forest models for probabilistic HPSG parsing [J].
Miyao, Yusuke ;
Tsujii, Jun'ichi .
COMPUTATIONAL LINGUISTICS, 2008, 34 (01) :35-80
[23]   Evaluating contributions of natural language parsers to protein-protein interaction extraction [J].
Miyao, Yusuke ;
Sagae, Kenji ;
Saetre, Rune ;
Matsuzaki, Takuya ;
Tsujii, Jun'ichi .
BIOINFORMATICS, 2009, 25 (03) :394-400
[24]   Identification and Analysis of Co-Occurrence Networks with NetCutter [J].
Muller, Heiko ;
Mancuso, Francesco .
PLOS ONE, 2008, 3 (09)
[25]  
Pakhomov SV, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P587
[26]  
Pierzchala M, 2011, ANIM SCI PAP REP, V29, P53
[27]   BioInfer:: a corpus for information extraction in the biomedical domain [J].
Pyysalo, Sampo ;
Ginter, Filip ;
Heimonen, Juho ;
Bjorne, Jari ;
Boberg, Jorma ;
Jarvinen, Jouni ;
Salakoski, Tapio .
BMC BIOINFORMATICS, 2007, 8 (1)
[28]   Extraction of regulatory gene/protein networks from Medline [J].
Saric, J ;
Jensen, LJ ;
Ouzounova, R ;
Rojas, I ;
Bork, P .
BIOINFORMATICS, 2006, 22 (06) :645-650
[29]   ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text [J].
Settles, B .
BIOINFORMATICS, 2005, 21 (14) :3191-3192
[30]   Construction of an annotated corpus to support biomedical information extraction [J].
Thompson, Paul ;
Iqbal, Syed A. ;
McNaught, John ;
Ananiadou, Sophia .
BMC BIOINFORMATICS, 2009, 10