Distribution of information in biomedical abstracts and full-text publications

被引:61
作者
Schuemie, MJ [1 ]
Weeber, M [1 ]
Schijvenaars, BJA [1 ]
van Mulligen, EM [1 ]
van der Eijk, CC [1 ]
Jelier, R [1 ]
Mons, B [1 ]
Kors, JA [1 ]
机构
[1] Erasmus Univ, Med Ctr Rotterdam, Dept Med Informat, NL-3000 DR Rotterdam, Netherlands
关键词
D O I
10.1093/bioinformatics/bth291
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Full-text documents potentially hold more information than their abstracts, but require more resources for processing. We investigated the added value of full text over abstracts in terms of information content and occurrences of gene symbol-gene name combinations that can resolve gene-symbol ambiguity. Results: We analyzed a set of 3902 biomedical full-text articles. Different keyword measures indicate that information density is highest in abstracts, but that the information coverage in full texts is much greater than in abstracts. Analysis of five different standard sections of articles shows that the highest information coverage is located in the results section. Still, 30-40% of the information mentioned in each section is unique to that section. Only 30% of the gene symbols in the abstract are accompanied by their corresponding names, and a further 8% of the gene names are found in the full text. In the full text, only 18% of the gene symbols are accompanied by their gene names.
引用
收藏
页码:2597 / 2604
页数:8
相关论文
共 6 条
[1]  
FRIEDMAN C, 2001, BIOINFORMATICS S1, V17, P74
[2]  
Schwartz Ariel S, 2003, Pac Symp Biocomput, P451
[3]   Information extraction from full text scientific articles: Where are the keywords? [J].
Shah, PK ;
Perez-Iratxeta, C ;
Bork, P ;
Andrade, MA .
BMC BIOINFORMATICS, 2003, 4 (1)
[4]   Tagging gene and protein names in biomedical text [J].
Tanabe, L ;
Wilbur, WJ .
BIOINFORMATICS, 2002, 18 (08) :1124-1132
[5]  
van Mulligen EM, 2000, J AM MED INFORM ASSN, P868
[6]  
Yu H, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P919