基于支持向量机和核心特征词的科技文献自动标引研究

被引:5
作者
白如江 [1 ,2 ,3 ]
王晓笛 [1 ]
王效岳 [1 ]
机构
[1] 山东理工大学图书馆
[2] 中国科学院国家科学图书馆
[3] 中国科学院大学
关键词
自动标引; 支持向量机; 特征提取; 科技文献;
D O I
10.16353/j.cnki.1000-7490.2014.07.023
中图分类号
G254 [文献标引与编目];
学科分类号
摘要
科技文献通常包括研究目的、方法、结果和结论等信息,如何将科技文献标引上这些信息,帮助科研人员在数量巨大的文献中快速发现符合研究需要的内容显得尤为重要。文章在研究分析科技文献写作特点基础上,提出了基于词、英文(专有名词、缩写词)以及数字的核心特征词提取策略;然后将科技文献标引问题转化为句子分类问题,结合提出的核心特征词,采用支持向量机分类器对科技文献进行句子级别的语义标引。通过对1168篇糖尿病医学类论文实验,证明本文提出的方法能够有效地学习和标引科技文献中的句子,进而有效地对科技文献关键信息点进行自动标引。
引用
收藏
页码:129 / 134
页数:6
相关论文
共 12 条
[1]  
Automatic recognition of conceptualization zones in scientific articles and two life science applications. Maria Liakata,Shyamasree Saha,Simon Dobnik. Bioinformatics . 2012
[2]  
LIBSVM: A Library for support vector machines. Chang, Chih-Chung,Lin, Chih-Jen. ACM Transactions on Intelligent Systems and Technology . 2011
[3]  
Categorization of Sentence Types inMedical Abstracts. McKnight L,Srinivasan P. Proc.of the 17th Conference of theAmerican Medical Informatics Association . 2003
[4]  
Metaknowledge annotation of bio-events annotation guidelines. NAWAZR,THOMPSON P,MCNAUGHT J,et al. http://www.nactem.ac.uk/meta-knoledge/AnnotationGuidelines.pdf .
[5]  
CRAB reader:atool for analysis and visualization of argumentative zones in scientific literature. GUO Y,SILINS I,REICHART R,et al. COLING 2012 24thInternational Conference on Computational Linguistics . 2012
[6]  
Teaching EFL Students to Extract Structural Information from Abstracts. Graetz,N. Reading for Professional Purposes:Methodsand Materials in Teaching Languages . 1985
[7]  
An AnnotationScheme for Discourse-Level Argumentation in Re-search Articles. Teufel S,Carletta J,Moens M. Proceedings of EACL’99 . 1999
[8]  
Ltp:A chinese language technology platform. Che W,Li Z,Liu T. Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations . 2010
[9]  
Identifying the Information Structure of Scientific Abstracts:An Investigation of Three Different Schemes. Guo Y,Korhonen A,Liakata M,et al. Proceedings of the2010 Workshop on Biomedical Natural Language Processing . 2010
[10]  
Identifying the epistemic value of discourse segments in biology texts. DE WAARD A,BUITELAAR P,EIGNER T. proceedings Eighth International Conference on Computational semantics . 2009