Database Citation in Full Text Biomedical Articles

被引:21
作者
Kafkas, Senay [1 ]
Kim, Jee-Hyub [1 ]
McEntyre, Johanna R. [1 ]
机构
[1] European Bioinformat Inst, European Mol Biol Lab, Cambridge, England
来源
PLOS ONE | 2013年 / 8卷 / 05期
基金
英国惠康基金;
关键词
D O I
10.1371/journal.pone.0063184
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.
引用
收藏
页数:9
相关论文
共 11 条
[1]  
[Anonymous], 2012, Science as an Open Enterprise
[2]   BioLit: integrating biological literature with databases [J].
Fink, J. Lynn ;
Kushch, Sergey ;
Williams, Parker R. ;
Bourne, Philip E. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :W385-W389
[3]   The Pfam protein families database [J].
Finn, Robert D. ;
Mistry, Jaina ;
Tate, John ;
Coggill, Penny ;
Heger, Andreas ;
Pollington, Joanne E. ;
Gavin, O. Luke ;
Gunasekaran, Prasad ;
Ceric, Goran ;
Forslund, Kristoffer ;
Holm, Liisa ;
Sonnhammer, Erik L. L. ;
Eddy, Sean R. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D211-D222
[4]   Annotating genes and genomes with DNA sequences extracted from biomedical articles [J].
Haeussler, Maximilian ;
Gerner, Martin ;
Bergman, Casey M. .
BIOINFORMATICS, 2011, 27 (07) :980-986
[5]  
KAHN P, 1988, Nucleic Acids Research, V16, pI
[6]   UKPMC: a full text article resource for the life sciences [J].
McEntyre, Johanna R. ;
Ananiadou, Sophia ;
Andrews, Stephen ;
Black, William J. ;
Boulderstone, Richard ;
Buttery, Paula ;
Chaplin, David ;
Chevuru, Sandeepreddy ;
Cobley, Norman ;
Coleman, Lee-Ann ;
Davey, Paul ;
Gupta, Bharti ;
Haji-Gholam, Lesley ;
Hawkins, Craig ;
Horne, Alan ;
Hubbard, Simon J. ;
Kim, Jee-Hyub ;
Lewin, Ian ;
Lyte, Vic ;
MacIntyre, Ross ;
Mansoor, Sami ;
Mason, Linda ;
McNaught, John ;
Newbold, Elizabeth ;
Nobata, Chikashi ;
Ong, Ernest ;
Pillai, Sharmila ;
Rebholz-Schuhmann, Dietrich ;
Rosie, Heather ;
Rowbotham, Rob ;
Rupp, C. J. ;
Stoehr, Peter ;
Vaughan, Philip .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D58-D65
[7]  
Mulder Nicola J, 2002, Brief Bioinform, V3, P225
[8]   Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE [J].
Neveol, Aurelie ;
Wilbur, W. John ;
Lu, Zhiyong .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[9]   Extraction of data deposition statements from the literature: a method for automatically tracking research results [J].
Neveol, Aurelie ;
Wilbur, W. John ;
Lu, Zhiyong .
BIOINFORMATICS, 2011, 27 (23) :3306-3312
[10]  
Parkinson H, 2005, NUCLEIC ACIDS RES, V33, pD553