Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature

被引:63
作者
Dahdul, Wasila M. [1 ,2 ]
Balhoff, James P. [2 ,3 ]
Engeman, Jeffrey [1 ]
Grande, Terry [4 ]
Hilton, Eric J. [5 ]
Kothari, Cartik [2 ,3 ]
Lapp, Hilmar [2 ]
Lundberg, John G. [6 ]
Midford, Peter E. [2 ]
Vision, Todd J. [2 ,3 ]
Westerfield, Monte [7 ]
Mabee, Paula M. [1 ]
机构
[1] Univ S Dakota, Dept Biol, Vermillion, SD 57069 USA
[2] Natl Evolutionary Synth Ctr, Durham, NC USA
[3] Univ N Carolina, Dept Biol, Chapel Hill, NC USA
[4] Loyola Univ, Dept Biol, Chicago, IL 60626 USA
[5] Virginia Inst Marine Sci, Coll William & Mary, Dept Fisheries Sci, Gloucester Point, VA 23062 USA
[6] Acad Nat Sci Philadelphia, Philadelphia, PA 19103 USA
[7] Univ Oregon, Inst Neurosci, Eugene, OR 97403 USA
来源
PLOS ONE | 2010年 / 5卷 / 05期
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
PHYLOGENETIC-RELATIONSHIPS; OSTARIOPHYSI; SILURIFORMES; TELEOSTEI; CHARACIFORMES; OSTEICHTHYES; LORICARIIDAE; GENOMICS; SUPPORT; PISCES;
D O I
10.1371/journal.pone.0010708
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies. Methodology/Principal Findings: We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, http://zfin.org). We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators. Conclusions/Significance: The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
引用
收藏
页数:12
相关论文
共 82 条