The taxonomic name resolution service: an online tool for automated standardization of plant names

被引:375
作者
Boyle, Brad [1 ,2 ]
Hopkins, Nicole [2 ,3 ]
Lu, Zhenyuan [2 ,4 ]
Garay, Juan Antonio Raygoza [2 ,3 ]
Mozzherin, Dmitry [5 ]
Rees, Tony [6 ]
Matasci, Naim [1 ,2 ,3 ]
Narro, Martha L. [2 ,3 ]
Piel, William H. [7 ]
Mckay, Sheldon J. [2 ,3 ,4 ]
Lowry, Sonya [2 ,3 ]
Freeland, Chris
Peet, Robert K. [8 ]
Enquist, Brian J. [1 ,9 ]
机构
[1] Univ Arizona Tucson, Dept Ecol & Evolutionary Biol, Tucson, AZ 85721 USA
[2] iPlant Collaborat, Tucson, AZ 85721 USA
[3] BIO5 Inst, Tucson, AZ 85721 USA
[4] Cold Spring Harbor Lab, Cold Spring Harbor, NY 11724 USA
[5] Marine Biol Lab, Ctr Lib & Informat, Woods Hole, MA 02543 USA
[6] CSIRO Marine & Atmospher Res, Div Data Ctr, Hobart, Tas 7001, Australia
[7] Yale NUS Coll, Singapore 138614, Singapore
[8] Univ N Carolina, Dept Biol, Chapel Hill, NC 27599 USA
[9] Santa Fe Inst, Santa Fe, NM 87501 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
美国国家科学基金会;
关键词
Biodiversity informatics; Database integration; Taxonomy; Plants; BIODIVERSITY DATABASES; SYSTEM;
D O I
10.1186/1471-2105-14-16
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The digitization of biodiversity data is leading to the widespread application of taxon names that are superfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate consequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy decisions. The lack of tools for correcting this 'names problem' has become a fundamental obstacle to integrating disparate data sources and advancing the progress of biodiversity science. Results: The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and user-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source applications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies, including the Missouri Botanical Garden's Tropicos database. Capable of processing thousands of names in a single operation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and converts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy and resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations, accession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active and legacy datasets. Conclusions: We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct spelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological datasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and design can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at http://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source code is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/.
引用
收藏
页数:14
相关论文
共 39 条
[1]  
[Anonymous], PROCEEDINGS OF THE T
[2]  
[Anonymous], CROP WILD RELATIVE
[3]  
[Anonymous], THE SOUNDEX CODING S
[4]  
[Anonymous], VEGETATION DATABASES
[5]  
[Anonymous], BOT LATIN
[6]  
[Anonymous], JOE CELKOS SQL FOR S
[7]  
Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
[8]  
Bortolus A, 2008, AMBIO, V37, P114, DOI 10.1579/0044-7447(2008)37[114:ECITBS]2.0.CO
[9]  
2
[10]  
Brummitt R.K., 1992, AUTHORS PLANT NAMES, P732