基于句法和语义特征的疾病名称识别

被引:19
作者
何云琪
刘苏文
钱龙华
周国栋
机构
[1] 苏州大学计算机科学与技术学院
基金
国家自然科学基金重点项目;
关键词
疾病名称识别; 条件随机场; 句法特征; 语义特征;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
120506 [数字人文];
摘要
生物医学实体识别(如基因/蛋白质、化学物和疾病等)是生物医学文本挖掘的基础,它对生物医学实体关系的抽取和生物医学知识库的建立等方面都有着重要的研究意义.针对目前的疾病名称识别中存在的问题,本文提出了一系列新的句法特征和语义特征来提高疾病名称识别的性能,其中句法特征包括组块和依存信息,语义特征包括疾病名称的缩写信息、字典信息和疾病概念之间的上下位关系等.在NCBI疾病语料库上的实验表明,结合一系列句法和语义特征的CRF模型可以显著提高疾病实体识别的性能,取得了目前该语料库上的最高F1值85.3%.
引用
收藏
页码:1546 / 1557
页数:12
相关论文
共 10 条
[1]
TaggerOne: joint named entity recognition and normalization with semi-Markov Models [J].
Leaman, Robert ;
Lu, Zhiyong .
BIOINFORMATICS, 2016, 32 (18) :2839-2846
[2]
GNormPlus: An Integrative Approach for Tagging Genes; Gene Families; and Protein Domains.[J].Chih-Hsuan Wei;Hung-Yu Kao;Zhiyong Lu;Yudong Cai.BioMed Research International.2015,
[3]
tmChem: a high performance approach for chemical named entity recognition and normalization [J].
Leaman, Robert ;
Wei, Chih-Hsuan ;
Lu, Zhiyong .
JOURNAL OF CHEMINFORMATICS, 2015, 7
[4]
NCBI disease corpus: A resource for disease name recognition and concept normalization.[J].Rezarta Islamaj Doğan;Robert Leaman;Zhiyong Lu.Journal of Biomedical Informatics.2014,
[5]
DNorm: disease name normalization with pairwise learning to rank [J].
Leaman, Robert ;
Dogan, Rezarta Islamaj ;
Lu, Zhiyong .
BIOINFORMATICS, 2013, 29 (22) :2909-2917
[6]
Abbreviation definition identification based on automatic precision estimates [J].
Sohn, Sunghwan ;
Comeau, Donald C. ;
Kim, Won ;
Wilbur, W. John .
BMC BIOINFORMATICS, 2008, 9 (1)
[7]
Comparative experiments on learning information extractors for proteins and their interactions.[J].Razvan Bunescu;Ruifang Ge;Rohit J. Kate;Edward M. Marcotte;Raymond J. Mooney;Arun K. Ramani;Yuk Wah Wong.Artificial Intelligence In Medicine.2004, 2
[8]
Long short-term memory [J].
Hochreiter, S ;
Schmidhuber, J .
NEURAL COMPUTATION, 1997, 9 (08) :1735-1780
[9]
SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297