Standardized Metadata for Human Pathogen/Vector Genomic Sequences

被引:32
作者
Dugan, Vivien G. [1 ,2 ,3 ]
Emrich, Scott J. [4 ]
Giraldo-Calderon, Gloria I. [4 ]
Harb, Omar S. [5 ]
Newman, Ruchi M. [6 ]
Pickett, Brett E. [1 ,2 ]
Schriml, Lynn M. [7 ]
Stockwell, Timothy B. [1 ,2 ]
Stoeckert, Christian J., Jr. [5 ]
Sullivan, Dan E. [8 ]
Singh, Indresh [1 ,2 ]
Ward, Doyle V. [6 ]
Yao, Alison [3 ]
Zheng, Jie [5 ]
Barrett, Tanya [9 ]
Birren, Bruce [6 ]
Brinkac, Lauren [1 ,2 ]
Bruno, Vincent M. [7 ]
Caler, Elizabet [1 ,2 ]
Chapman, Sinead [6 ]
Collins, Frank H. [4 ]
Cuomo, Christina A. [6 ]
Di Francesco, Valentina [3 ]
Durkin, Scott [1 ,2 ]
Eppinger, Mark [7 ]
Feldgarden, Michael [6 ]
Fraser, Claire [7 ]
Fricke, W. Florian [7 ]
Giovanni, Maria [3 ]
Henn, Matthew R. [6 ]
Hine, Erin [7 ]
Hotopp, Julie Dunning [7 ]
Karsch-Mizrachi, Ilene [9 ]
Kissinger, Jessica C. [10 ]
Lee, Eun Mi [3 ]
Mathur, Punam [3 ]
Mongodin, Emmanuel F. [7 ]
Murphy, Cheryl I. [6 ]
Myers, Garry [7 ]
Neafsey, Daniel E. [6 ]
Nelson, Karen E. [1 ,2 ]
Nierman, William C. [1 ,2 ]
Puzak, Julia [11 ]
Rasko, David [7 ]
Roos, David S. [5 ]
Sadzewicz, Lisa [7 ]
Silva, Joana C. [7 ]
Sobral, Bruno [8 ]
Squires, R. Burke [3 ]
Stevens, Rick L. [12 ]
机构
[1] J Craig Venter Inst, Rockville, MD 20850 USA
[2] J Craig Venter Inst, La Jolla, CA USA
[3] NIAID, Rockville, MD USA
[4] Univ Notre Dame, Notre Dame, IN 46556 USA
[5] Univ Penn, Philadelphia, PA 19104 USA
[6] Broad Inst, Cambridge, MA USA
[7] Univ Maryland, Sch Med, Inst Genome Sci, Baltimore, MD 21201 USA
[8] Virginia Bioinformat Inst, Cyberinfrastruct Div, Blacksburg, VA USA
[9] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[10] Univ Georgia, Athens, GA 30602 USA
[11] Kelly Govt Solut, Rockville, MD USA
[12] Argonne Natl Lab, Lemont, IL USA
[13] Univ Calif San Diego, Dept Pathol, San Diego, CA 92103 USA
来源
PLOS ONE | 2014年 / 9卷 / 06期
关键词
BIOINFORMATICS RESOURCE; METAGENOMIC PROJECTS; MINIMUM INFORMATION; DATABASE GOLD; ONTOLOGIES;
D O I
10.1371/journal.pone.0099979
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U. S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
引用
收藏
页数:11
相关论文
共 29 条
[1]   EuPathDB: The Eukaryotic Pathogen database [J].
Aurrecoechea, Cristina ;
Barreto, Ana ;
Brestelli, John ;
Brunk, Brian P. ;
Cade, Shon ;
Doherty, Ryan ;
Fischer, Steve ;
Gajria, Bindu ;
Gao, Xin ;
Gingle, Alan ;
Grant, Greg ;
Harb, Omar S. ;
Heiges, Mark ;
Hu, Sufen ;
Iodice, John ;
Kissinger, Jessica C. ;
Kraemer, Eileen T. ;
Li, Wei ;
Pinney, Deborah F. ;
Pitts, Brian ;
Roos, David S. ;
Srinivasamoorthy, Ganesh ;
Stoeckert, Christian J., Jr. ;
Wang, Haiming ;
Warrenfeltz, Susanne .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D684-D691
[2]   BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata [J].
Barrett, Tanya ;
Clark, Karen ;
Gevorgyan, Robert ;
Gorelenkov, Vyacheslav ;
Gribov, Eugene ;
Karsch-Mizrachi, Ilene ;
Kimelman, Michael ;
Pruitt, Kim D. ;
Resenchuk, Sergei ;
Tatusova, Tatiana ;
Yaschenko, Eugene ;
Ostell, James .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D57-D63
[3]  
Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
[4]   Modeling biomedical experimental processes with OBI [J].
Brinkman R.R. ;
Courtot M. ;
Derom D. ;
Fostel J.M. ;
He Y. ;
Lord P. ;
Malone J. ;
Parkinson H. ;
Peters B. ;
Rocca-Serra P. ;
Ruttenberg A. ;
Sansone S.-A. ;
Soldatova L.N. ;
Stoeckert C.J., Jr. ;
Turner J.A. ;
Zheng J. ;
Grethe J. ;
Rubin D. ;
Bug B. ;
Wiemann S. ;
Hernandez-Boussard T. ;
Scheuermann R. ;
Bruskiewich R. ;
Gibson F. ;
Morrison N. ;
Field D. ;
Gray T. ;
Deutsch E. ;
Schober D. ;
Montecchi L. ;
Taylor C. ;
Whetzel T. ;
Westbrook J. ;
Fragoso G. ;
White J. ;
Heiskanen M. ;
Fan L. ;
Causton H. ;
Lister A. ;
Clancy K. ;
Cocos C. ;
Greenbaum J. ;
Grenon P. ;
Mungall C. ;
Pocock M. ;
Stenzhorn H. ;
Hunter L. ;
Mc Gee M. ;
Smith B. ;
Stevens R. .
Journal of Biomedical Semantics, 1 (Suppl 1)
[5]   The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queries [J].
Côté, RG ;
Jones, P ;
Apweiler, R ;
Hermjakob, H .
BMC BIOINFORMATICS, 2006, 7 (1)
[6]   Plague in the genomic area [J].
Drancourt, M. .
CLINICAL MICROBIOLOGY AND INFECTION, 2012, 18 (03) :224-230
[7]   The minimum information about a genome sequence (MIGS) specification [J].
Field, Dawn ;
Garrity, George ;
Gray, Tanya ;
Morrison, Norman ;
Selengut, Jeremy ;
Sterk, Peter ;
Tatusova, Tatiana ;
Thomson, Nicholas ;
Allen, Michael J. ;
Angiuoli, Samuel V. ;
Ashburner, Michael ;
Axelrod, Nelson ;
Baldauf, Sandra ;
Ballard, Stuart ;
Boore, Jeffrey ;
Cochrane, Guy ;
Cole, James ;
Dawyndt, Peter ;
De Vos, Paul ;
dePamphilis, Claude ;
Edwards, Robert ;
Faruque, Nadeem ;
Feldman, Robert ;
Gilbert, Jack ;
Gilna, Paul ;
Gloeckner, Frank Oliver ;
Goldstein, Philip ;
Guralnick, Robert ;
Haft, Dan ;
Hancock, David ;
Hermjakob, Henning ;
Hertz-Fowler, Christiane ;
Hugenholtz, Phil ;
Joint, Ian ;
Kagan, Leonid ;
Kane, Matthew ;
Kennedy, Jessie ;
Kowalchuk, George ;
Kottmann, Renzo ;
Kolker, Eugene ;
Kravitz, Saul ;
Kyrpides, Nikos ;
Leebens-Mack, Jim ;
Lewis, Suzanna E. ;
Li, Kelvin ;
Lister, Allyson L. ;
Lord, Phillip ;
Maltsev, Natalia ;
Markowitz, Victor ;
Martiny, Jennifer .
NATURE BIOTECHNOLOGY, 2008, 26 (05) :541-547
[8]   The Genomic Standards Consortium [J].
Field, Dawn ;
Amaral-Zettler, Linda ;
Cochrane, Guy ;
Cole, James R. ;
Dawyndt, Peter ;
Garrity, George M. ;
Gilbert, Jack ;
Gloeckner, Frank Oliver ;
Hirschman, Lynette ;
Karsch-Mizrachi, Ilene ;
Klenk, Hans-Peter ;
Knight, Rob ;
Kottmann, Renzo ;
Kyrpides, Nikos ;
Meyer, Folker ;
San Gil, Inigo ;
Sansone, Susanna-Assunta ;
Schriml, Lynn M. ;
Sterk, Peter ;
Tatusova, Tatiana ;
Ussery, David W. ;
White, Owen ;
Wooley, John .
PLOS BIOLOGY, 2011, 9 (06)
[9]   National institute of allergy and infectious diseases bioinformatics resource centers: New assets for pathogen informaticsv [J].
Greene, John M. ;
Collins, Frank ;
Lefkowitz, Elliot J. ;
Roos, David ;
Scheuermann, Richard H. ;
Sobral, Bruno ;
Stevens, Rick ;
White, Owen ;
Di Francesco, Valentina .
INFECTION AND IMMUNITY, 2007, 75 (07) :3212-3219
[10]  
Grenon P., 2004, Spat. Cognit. Comput, V4, P69