Moara: a Java']Java library for extracting and normalizing gene and protein mentions

被引:16
作者
Neves, Mariana L. [1 ]
Carazo, Jose-Maria [1 ]
Pascual-Montano, Alberto [1 ,2 ]
机构
[1] CSIC, CNB, BioComp Unit, Natl Biotechnol Ctr, Madrid, Spain
[2] CSIC, IMMPA, Madrid, Spain
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
TEXT; NOMENCLATURE; LISTS;
D O I
10.1186/1471-2105-11-157
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available. Results: This study proposes a versatile and trainable Java library that implements gene/protein tagger and normalization steps based on machine learning approaches. The system has been trained for several model organisms and corpora but can be expanded to support new organisms and documents. Conclusions: Moara is a flexible, trainable and open-source system that is not specifically orientated to any organism and therefore does not requires specific tuning in the algorithms or dictionaries utilized. Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system.
引用
收藏
页数:13
相关论文
共 26 条
[1]  
AAMODT A, 1994, AI COMMUN, V7, P39
[2]  
[Anonymous], 2003, IIWeb
[3]   The Universal Protein Resource (UniProt) 2009 [J].
Bairoch, Amos ;
Consortium, UniProt ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Ciapina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Roggli, Paula Duek ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Kappler, Thomas ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D169-D174
[4]   Automatically annotating documents with normalized gene lists [J].
Crim, J ;
McDonald, R ;
Pereira, F .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[5]  
DAELEMANS W, 1996, 4 WORKSH VER LARG CO, P14
[6]   A simple approach for protein name identification:: prospects and limits [J].
Fundel, K ;
Güttler, D ;
Zimmer, R ;
Apostolakis, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[7]   Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[8]   ProMiner: rule-based protein and gene entity recognition [J].
Hanisch, D ;
Fundel, K ;
Mevissen, HT ;
Zimmer, R ;
Fluck, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[9]   Overview of BioCreAtIvE task IB: normalized gene lists [J].
Hirschman, L ;
Colosimo, M ;
Morgan, A ;
Yeh, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[10]  
KANO Y, 2009, BIOINFORMATICS