Overview of BioCreative II gene normalization

被引:175
作者
Morgan, Alexander A. [2 ]
Lu, Zhiyong [3 ]
Wang, Xinglong [4 ]
Cohen, Aaron M. [5 ]
Fluck, Juliane [6 ]
Ruch, Patrick [7 ]
Divoli, Anna [8 ]
Fundel, Katrin [9 ]
Leaman, Robert [10 ]
Hakenberg, Joerg [11 ]
Sun, Chengjie [12 ]
Liu, Heng-hui [13 ]
Torres, Rafael [14 ]
Krauthammer, Michael [15 ]
Lau, William W. [16 ]
Liu, Hongfang [17 ]
Hsu, Chun-Nan [18 ]
Schuemie, Martijn [19 ]
Cohen, K. Bretonnel [1 ]
Hirschman, Lynette [1 ]
机构
[1] Mitre Corp, Ctr Informat Technol, Bedford, MA 01730 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Univ Colorado, Sch Med, Ctr Computat Pharmacol, Aurora, CO 80045 USA
[4] Univ Edinburgh, Sch Informat, Edinburgh EH8 9LW, Midlothian, Scotland
[5] Oregon Hlth & Sci Univ, Portland, OR 97239 USA
[6] Fraunhofer Inst Algorithms & Sci Comp, D-53754 Schloss Birlinghoven, Sankt Augustin, Germany
[7] Univ & Hosp Geneva, CH-1201 Geneva, Switzerland
[8] Univ Calif Berkeley, Sch Informat, Berkeley, CA 94720 USA
[9] Univ Munich, Inst Informat, D-80333 Munich, Germany
[10] Arizona State Univ, Dept Comp Sci & Engn, Tempe, AZ 85281 USA
[11] Tech Univ Dresden, Ctr Biotechnol, D-1307 Dresden, Germany
[12] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[13] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
[14] Biolma, Bioalma, E-28760 Madrid, Spain
[15] Yale Univ, Sch Med, Dept Pathol, New Haven, CT 06510 USA
[16] NIH, Div Computat Biosci, Ctr Informat Technol, Bethesda, MD 20892 USA
[17] Georgetown Univ, Med Ctr, Dept Biostat Bioinformat & Biomath, Washington, DC 20057 USA
[18] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
[19] Erasmus MC Univ Med Ctr, Dept Med Informat, Biosemant Grp, NL-3015 GE Rotterdam, Netherlands
来源
GENOME BIOLOGY | 2008年 / 9卷
基金
美国国家科学基金会;
关键词
D O I
10.1186/gb-2008-9-S2-S3
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes ( often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. Results: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. Conclusion: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases.
引用
收藏
页数:19
相关论文
共 44 条
[1]  
[Anonymous], P 2 BIOCREATIVE CHAL
[2]  
[Anonymous], 2007, PROC 2 BIOCREATIVE C
[3]  
[Anonymous], 2007, Proceedings of the second BioCreAtIvE challenge evaluation workshop
[4]  
ARONSON A, 2005, TREC P TREC GAITH MD
[5]  
BAUMGARTNER WA, 2007, P 2 BIOCREATIVE CHAL, P257
[6]  
CARPENTER R, 2004, 13 ANN TEXT RETR C G
[7]  
Chiang J., 2007, P 2 BIOCREATIVE CHAL, P157
[8]  
Cohen AM, 2007, P 2 BIOCREATIVE CHAL, P169
[9]  
COHEN AM, 2005, P ACL ISMB WORKSH LI, P17
[10]   Data preparation and interannotator agreement: BioCreAtIvE task IB [J].
Colosimo, ME ;
Morgan, AA ;
Yeh, AS ;
Colombe, JB ;
Hirschman, L .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)