A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework

被引:14
作者
Bandrowski, A. E. [1 ]
Cachat, J. [1 ]
Li, Y. [2 ]
Mueller, H. M. [2 ]
Sternberg, P. W. [2 ]
Ciccarese, P. [3 ,4 ]
Clark, T. [3 ,4 ]
Marenco, L. [5 ]
Wang, R. [5 ]
Astakhov, V. [1 ]
Grethe, J. S. [1 ]
Martone, M. E. [1 ]
机构
[1] Univ Calif San Diego, Ctr Res Biol Syst, Pasadena, CA 91125 USA
[2] CALTECH, Div Biol, Pasadena, CA 91125 USA
[3] Harvard Univ, Sch Med, Cambridge, MA 02138 USA
[4] Massachusetts Gen Hosp, Boston, MA 02114 USA
[5] Yale Univ, Sch Med, Ctr Med Informat, New Haven, CT 06520 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2012年
基金
美国国家卫生研究院;
关键词
D O I
10.1093/database/bas005
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is 'hidden' from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community.
引用
收藏
页数:11
相关论文
共 13 条
[1]   Challenges and Opportunities in Mining Neuroscience Data [J].
Akil, Huda ;
Martone, Maryann E. ;
Van Essen, David C. .
SCIENCE, 2011, 331 (6018) :708-712
[2]   The NIFSTD and BIRNLex Vocabularies: Building Comprehensive Ontologies for Neuroscience [J].
Bug, William J. ;
Ascoli, Giorgio A. ;
Grethe, Jeffrey S. ;
Gupta, Amarnath ;
Fennema-Notestine, Christine ;
Laird, Angela R. ;
Larson, Stephen D. ;
Rubin, Daniel ;
Shepherd, Gordon M. ;
Turner, Jessica A. ;
Martone, Maryann E. .
NEUROINFORMATICS, 2008, 6 (03) :175-194
[3]  
Ciccarese P, 2011, BIOONTOLOGIES 2011
[4]   An open annotation ontology for science on web 3.0 [J].
Ciccarese P. ;
Ocana M. ;
Garcia Castro L.J. ;
Das S. ;
Clark T. .
Journal of Biomedical Semantics, 2 (Suppl 2)
[5]   The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection [J].
Galperin, Michael Y. ;
Fernandez-Suarez, Xose M. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D1-D8
[6]   The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience [J].
Gardner, Daniel ;
Akil, Huda ;
Ascoli, Giorgio A. ;
Bowden, Douglas M. ;
Bug, William ;
Donohue, Duncan E. ;
Goldberg, David H. ;
Grafstein, Bernice ;
Grethe, Jeffrey S. ;
Gupta, Amarnath ;
Halavi, Maryam ;
Kennedy, David N. ;
Marenco, Luis ;
Martone, Maryann E. ;
Miller, Perry L. ;
Mueller, Hans-Michael ;
Robert, Adrian ;
Shepherd, Gordon M. ;
Sternberg, Paul W. ;
Van Essen, David C. ;
Williams, Robert W. .
NEUROINFORMATICS, 2008, 6 (03) :149-160
[7]   Towards BioDBcore: a community-defined information specification for biological databases [J].
Gaudet, Pascale ;
Bairoch, Amos ;
Field, Dawn ;
Sansone, Susanna-Assunta ;
Taylor, Chris ;
Attwood, Teresa K. ;
Bateman, Alex ;
Blake, Judith A. ;
Bult, Carol J. ;
Cherry, J. Michael ;
Chisholm, Rex L. ;
Cochrane, Guy ;
Cook, Charles E. ;
Eppig, Janan T. ;
Galperin, Michael Y. ;
Gentleman, Robert ;
Goble, Carole A. ;
Gojobori, Takashi ;
Hancock, John M. ;
Howe, Douglas G. ;
Imanishi, Tadashi ;
Kelso, Janet ;
Landsman, David ;
Lewis, Suzanna E. ;
Mizrachi, Ilene Karsch ;
Orchard, Sandra ;
Ouellette, B. F. Francis ;
Ranganathan, Shoba ;
Richardson, Lorna ;
Rocca-Serra, Philippe ;
Schofield, Paul N. ;
Smedley, Damian ;
Southan, Christopher ;
Tan, Tin W. ;
Tatusova, Tatiana ;
Whetzel, Patricia L. ;
White, Owen ;
Yamasaki, Chisato .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2011,
[8]   Federated Access to Heterogeneous Information Resources in the Neuroscience Information Framework (NIF) [J].
Gupta, Amarnath ;
Bug, William ;
Marenco, Luis ;
Qian, Xufei ;
Condit, Christopher ;
Rangarajan, Arun ;
Mueller, Hans Michael ;
Miller, Perry L. ;
Sanders, Brian ;
Grethe, Jeffrey S. ;
Astakhov, Vadim ;
Shepherd, Gordon ;
Sternberg, Paul W. ;
Martone, Maryann E. .
NEUROINFORMATICS, 2008, 6 (03) :205-217
[9]   The NIF LinkOut Broker: A Web Resource to Facilitate Federated Data Integration using NCBI Identifiers [J].
Marenco, Luis ;
Ascoli, Giorgio A. ;
Martone, Maryann E. ;
Shepherd, Gordon M. ;
Miller, Perry L. .
NEUROINFORMATICS, 2008, 6 (03) :219-227
[10]   The NIF DISCO Framework: Facilitating Automated Integration of Neuroscience Content on the Web [J].
Marenco, Luis ;
Wang, Rixin ;
Shepherd, Gordon M. ;
Miller, Perry L. .
NEUROINFORMATICS, 2010, 8 (02) :101-112