Automatic extraction of reference gene from literature in plants based on texting mining

被引:3
作者
He Lin [1 ]
Shen Gengyu [2 ]
Li Fei [3 ]
Huang Shuiqing [1 ]
机构
[1] Nanjing Agr Univ, Dept Informat Management, Nanjing 210095, Jiangsu, Peoples R China
[2] Nanjing Agr Univ, Lib, Nanjing 210095, Jiangsu, Peoples R China
[3] Nanjing Agr Univ, Dept Entomol, Nanjing 210095, Jiangsu, Peoples R China
关键词
biological knowledge discovery; machine learning; NLP; reference gene; text mining; real-time quantitative polymerase chain reaction; bioinformatics; POLYMERASE-CHAIN-REACTION; BIOMEDICAL LITERATURE; EXPRESSION; SELECTION; NORMALIZATION; SYSTEM; TOOL;
D O I
10.1504/IJDMB.2015.070063
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Real-Time Quantitative Polymerase Chain Reaction (qRT-PCR) is widely used in biological research. It is a key to the availability of qRT-PCR experiment to select a stable reference gene. However, selecting an appropriate reference gene usually requires strict biological experiment for verification with high cost in the process of selection. Scientific literatures have accumulated a lot of achievements on the selection of reference gene. Therefore, mining reference genes under specific experiment environments from literatures can provide quite reliable reference genes for similar qRT-PCR experiments with the advantages of reliability, economic and efficiency. An auxiliary reference gene discovery method from literature is proposed in this paper which integrated machine learning, natural language processing and text mining approaches. The validity tests showed that this new method has a better precision and recall on the extraction of reference genes and their environments.
引用
收藏
页码:400 / 416
页数:17
相关论文
共 35 条
[1]  
Baeza-Yates R., 1999, ACM PRESS SERIES, P73
[2]   Extraction of semantic biomedical relations from text using conditional random fields [J].
Bundschus, Markus ;
Dejori, Mathaeus ;
Stetter, Martin ;
Tresp, Volker ;
Kriegel, Hans-Peter .
BMC BIOINFORMATICS, 2008, 9 (1)
[3]   Gene name ambiguity of eukaryotic nomenclatures [J].
Chen, LF ;
Liu, HF ;
Friedman, C .
BIOINFORMATICS, 2005, 21 (02) :248-256
[4]   PIMiner: a web tool for extraction of protein interactions from biomedical literature [J].
Chowdhary, Rajesh ;
Zhang, Jinfeng ;
Tan, Sin Lam ;
Osborne, Daniel E. ;
Bajic, Vladimir B. ;
Liu, Jun S. .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2013, 7 (04) :450-462
[5]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71
[6]   Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis [J].
Czechowski, T ;
Stitt, M ;
Altmann, T ;
Udvardi, MK ;
Scheible, WR .
PLANT PHYSIOLOGY, 2005, 139 (01) :5-17
[7]   PathCase: pathways database system [J].
Elliott, Brendan ;
Kirac, Mustafa ;
Cakmak, Ali ;
Yavas, Gokhan ;
Mayes, Stephen ;
Cheng, En ;
Wang, Yuan ;
Gupta, Chirag ;
Ozsoyoglu, Gultekin ;
Ozsoyoglu, Zehra Meral .
BIOINFORMATICS, 2008, 24 (21) :2526-2533
[8]   CoPub: a literature-based keyword enrichment tool for microarray data analysis [J].
Frijters, Raoul ;
Heupers, Bart ;
van Beek, Pieter ;
Bouwhuis, Maurice ;
van Schaik, Rene ;
de Vlieg, Jacob ;
Polman, Jan ;
Alkema, Wynand .
NUCLEIC ACIDS RESEARCH, 2008, 36 :W406-W410
[9]   Protein structures and information extraction from biological texts: The PASTA system [J].
Gaizauskas, R ;
Demetriou, G ;
Artymiuk, PJ ;
Willett, P .
BIOINFORMATICS, 2003, 19 (01) :135-143
[10]  
Hakenberg J., 2005, P 4 LEARN LANG LOG W, P38