Building a pipeline to solicit expert knowledge from the community to aid gene summary curation

被引:4
作者
Antonazzo, Giulia [1 ]
Urbano, Jose M. [1 ]
Marygold, Steven J. [1 ]
Millburn, Gillian H. [1 ]
Brown, Nicholas H. [1 ]
机构
[1] Univ Cambridge, Dept Physiol Dev & Neurosci, Downing St, Cambridge CB2 3DY, England
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2020年
基金
英国医学研究理事会; 美国国家卫生研究院;
关键词
ONTOLOGY;
D O I
10.1093/database/baz152
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Brief summaries describing the function of each gene's product(s) are of great value to the research community, especially when interpreting genome-wide studies that reveal changes to hundreds of genes. However, manually writing such summaries, even for a single species, is a daunting task; for example, the Drosophila melanogaster genome contains almost 14 000 protein-coding genes. One solution is to use computational methods to generate summaries, but this often fails to capture the key functions or express them eloquently. Here, we describe how we solicited help from the research community to generate manually written summaries of D. melanogaster gene function. Based on the data within the FlyBase database, we developed a computational pipeline to identify researchers who have worked extensively on each gene. We e-mailed these researchers to ask them to draft a brief summary of the main function(s) of the gene's product, which we edited for consistency to produce a 'gene snapshot'. This approach yielded 1800 gene snapshot submissions within a 3-month period. We discuss the general utility of this strategy for other databases that capture data from the research literature.
引用
收藏
页数:10
相关论文
共 18 条
  • [11] WormBase 2017: molting into a new stage
    Lee, Raymond Y. N.
    Howe, Kevin L.
    Harris, Todd W.
    Arnaboldi, Valerio
    Cain, Scott
    Chan, Juancarlos
    Chen, Wen J.
    Davis, Paul
    Gao, Sibyl
    Grove, Christian
    Kishore, Ranjana
    Muller, Hans-Michael
    Nakamura, Cecilia
    Nuin, Paulo
    Paulini, Michael
    Raciti, Daniela
    Rodgers, Faye
    Russell, Matt
    Schindelman, Gary
    Tuli, Mary Ann
    Van Auken, Kimberly
    Wang, Qinghua
    Williams, Gary
    Wright, Adam
    Yook, Karen
    Berriman, Matthew
    Kersey, Paul
    Schedl, Tim
    Stein, Lincoln
    Sternberg, Paul W.
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D869 - D874
  • [12] Generating gene summaries from biomedical literature: A study of semi-structured summarization
    Ling, Xu
    Jiang, Jing
    He, Xin
    Mei, Qiaozhu
    Zhai, Chengxiang
    Schatz, Bruce
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (06) : 1777 - 1791
  • [13] A Chado case study: an ontology-based modular schema for representing genome-associated biological information
    Mungall, Christopher J.
    Emmert, David B.
    [J]. BIOINFORMATICS, 2007, 23 (13) : I337 - I346
  • [14] Rodriguez-Esteban R., 2019, BIORXIV, DOI [10.1101/633255, DOI 10.1101/633255]
  • [15] Saccharomyces genome database informs human biology
    Skrzypek, Marek S.
    Nash, Robert S.
    Wong, Edith D.
    MacPherson, Kevin A.
    Hellerstedt, Sage T.
    Engel, Stacia R.
    Karra, Kalpana
    Weng, Shuai
    Sheppard, Travis K.
    Binkley, Gail
    Simison, Matt
    Miyasato, Stuart R.
    Cherry, J. Michael
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D736 - D742
  • [16] FlyBase 2.0: the next generation
    Thurmond, Jim
    Goodman, Joshua L.
    Strelets, Victor B.
    Attrill, Helen
    Gramates, L. Sian
    Marygold, Steven J.
    Matthews, Beverley B.
    Millburn, Gillian
    Antonazzo, Giulia
    Trovisco, Vitor
    Kaufman, Thomas C.
    Calvi, Brian R.
    Perrimon, Norbert
    Gelbart, Susan Russo
    Agapite, Julie
    Broll, Kris
    Crosby, Lynn
    dos Santos, Gilberto
    Emmert, David
    Falls, Kathleen
    Jenkins, Victoria
    Sutherland, Carol
    Tabone, Christopher
    Zhou, Pinglei
    Zytkovicz, Mark
    Brown, Nick
    Garapati, Phani
    Holmes, Alex
    Larkin, Aoife
    Marygold, Steven
    Pilgrim, Clare
    Urbano, Pepe
    Czoch, Bryon
    Goodman, Josh
    Thurmond, Jim
    Cripps, Richard
    Baker, Phillip
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D759 - D765
  • [17] Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts
    Tsueng, Ginger
    Nanis, Max
    Fouquier, Jennifer T.
    Mayers, Michael
    Good, Benjamin M.
    Su, Andrew, I
    [J]. BIOINFORMATICS, 2020, 36 (04) : 1226 - 1233
  • [18] OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs
    Zdobnov, Evgeny M.
    Tegenfeldt, Fredrik
    Kuznetsov, Dmitry
    Waterhouse, Robert M.
    Simao, Felipe A.
    Ioannidis, Panagiotis
    Seppey, Mathieu
    Loetscher, Alexis
    Kriventseva, Evgenia V.
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D744 - D749