Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion

被引:43
作者
Agarwal, Shashank [1 ]
Yu, Hong [1 ]
机构
[1] Univ Wisconsin, Milwaukee, WI 53211 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btp548
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Biomedical texts can be typically represented by four rhetorical categories: Introduction, Methods, Results and Discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied different approaches for automatically classifying sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We first evaluated whether sentences in full-text biomedical articles could be reliably annotated into the IMRAD format and then explored different approaches for automatically classifying these sentences into the IMRAD categories. Our results show an overall annotation agreement of 82.14% with a Kappa score of 0.756. The best classification system is a multinomial naive Bayes classifier trained on manually annotated data that achieved 91.95% accuracy and an average F-score of 91.55%, which is significantly higher than baseline systems. A web version of this system is available online at-http://wood.ims.uwm.edu/full_text_classifier/.
引用
收藏
页码:3174 / 3180
页数:7
相关论文
共 28 条
[1]  
Agarwal Sachin, 2009, 2009 Proceedings of 18th International Conference on Computer Communications and Networks - ICCCN 2009, DOI 10.1109/ICCCN.2009.5235383
[2]  
Day R.A., 1998, WRITE PUBLISH SCI PA, V5th
[3]  
Fleiss JL., 1981, STAT METHODS RATES P, V2
[4]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[5]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174
[6]  
GABBAY I, 2004, ACL WORKSH QUEST ANS, P16
[7]  
Gospodnetic O., 2005, LUCENE ACTION
[8]  
HERSH W, 2006, TREC GEN TRACK C NAT, P52
[9]   Accurate unlexicalized parsing [J].
Klein, D ;
Manning, CD .
41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, :423-430
[10]  
LIN J, 2006, GENERATIVE CONTENT M