Database Citation in Full Text Biomedical Articles

被引：21

作者：

Kafkas, Senay ^{[1
]}

Kim, Jee-Hyub ^{[1
]}

McEntyre, Johanna R. ^{[1
]}

机构：

[1] European Bioinformat Inst, European Mol Biol Lab, Cambridge, England

来源：

PLOS ONE | 2013年 / 8卷 / 05期

基金：

英国惠康基金;

关键词：

D O I：

10.1371/journal.pone.0063184

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.

引用

页数：9