UniChem: a unified chemical structure cross-referencing and identifier tracking system

被引:115
作者
Chambers, Jon [1 ]
Davies, Mark [1 ]
Gaulton, Anna [1 ]
Hersey, Anne [1 ]
Velankar, Sameer [2 ]
Petryszak, Robert [3 ]
Hastings, Janna [4 ]
Bellis, Louisa [1 ]
McGlinchey, Shaun [1 ]
Overington, John P. [1 ]
机构
[1] ChEMBL, Cambridge CB10 1SD, England
[2] Prot Data Bank Europe, Cambridge CB10 1SD, England
[3] Gene Express Atlas, Cambridge CB10 1SD, England
[4] EMBL EBI, ChEBI, Cambridge CB10 1SD, England
来源
JOURNAL OF CHEMINFORMATICS | 2013年 / 5卷
基金
英国惠康基金;
关键词
UniChem; InChi; InChiKey; Chemical databases; Data integration; SERVICE; UPDATE;
D O I
10.1186/1758-2946-5-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
UniChem is a freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-based resources. In the past, the creation and maintenance of such links at EMBL-EBI, where several chemistry-based resources exist, has required independent efforts by each of the separate teams. These efforts were complicated by the different data models, release schedules, and differing business rules for compound normalization and identifier nomenclature that exist across the organization. UniChem, a large-scale, non-redundant database of Standard InChIs with pointers between these structures and chemical identifiers from all the separate chemistry resources, was developed as a means of efficiently sharing the maintenance overhead of creating these links. Thus, for each source represented in UniChem, all links to and from all other sources are automatically calculated and immediately available for all to use. Updated mappings are immediately available upon loading of new data releases from the sources. Web services in UniChem provide users with a single simple automatable mechanism for maintaining all links from their resource to all other sources represented in UniChem. In addition, functionality to track changes in identifier usage allows users to monitor which identifiers are current, and which are obsolete. Lastly, UniChem has been deliberately designed to allow additional resources to be included with minimal effort. Indeed, the recent inclusion of data sources external to EMBL-EBI has provided a simple means of providing users with an even wider selection of resources with which to link to, all at no extra cost, while at the same time providing a simple mechanism for external resources to link to all EMBL-EBI chemistry resources.
引用
收藏
页数:9
相关论文
共 7 条
  • [1] Bolton E, 2008, ANN REPORTS COMPUTAT, P12
  • [2] The Protein Identifier Cross-Referencing (PICR) service:: reconciling protein identifiers across multiple source databases
    Cote, Richard G.
    Jones, Philip
    Martens, Lennart
    Kerrien, Samuel
    Reisinger, Florian
    Lin, Quan
    Leinonen, Rasko
    Apweiler, Rolf
    Hermjakob, Henning
    [J]. BMC BIOINFORMATICS, 2007, 8 (1) : 401
  • [3] Chemical Entities of Biological Interest: an update
    de Matos, Paula
    Alcantara, Rafael
    Dekker, Adriano
    Ennis, Marcus
    Hastings, Janna
    Haug, Kenneth
    Spiteri, Inmaculada
    Turner, Steve
    Steinbeck, Christoph
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D249 - D254
  • [4] ChEMBL: a large-scale bioactivity database for drug discovery
    Gaulton, Anna
    Bellis, Louisa J.
    Bento, A. Patricia
    Chambers, Jon
    Davies, Mark
    Hersey, Anne
    Light, Yvonne
    McGlinchey, Shaun
    Michalovich, David
    Al-Lazikani, Bissan
    Overington, John P.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1100 - D1107
  • [5] Gene Expression Atlas update-a value-added database of microarray and sequencing-based functional genomics experiments
    Kapushesky, Misha
    Adamusiak, Tomasz
    Burdett, Tony
    Culhane, Aedin
    Farne, Anna
    Filippov, Alexey
    Holloway, Ele
    Klebanov, Andrey
    Kryvych, Nataliya
    Kurbatova, Natalja
    Kurnosov, Pavel
    Malone, James
    Melnichuk, Olga
    Petryszak, Robert
    Pultsin, Nikolay
    Rustici, Gabriella
    Tikhonov, Andrew
    Travillian, Ravensara S.
    Williams, Eleanor
    Zorin, Andrey
    Parkinson, Helen
    Brazma, Alvis
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D1077 - D1081
  • [6] PDBe: Protein Data Bank in Europe
    Velankar, S.
    Alhroub, Y.
    Best, C.
    Caboche, S.
    Conroy, M. J.
    Dana, J. M.
    Fernandez Montecelo, M. A.
    van Ginkel, G.
    Golovin, A.
    Gore, S. P.
    Gutmanas, A.
    Haslam, P.
    Hendrickx, P. M. S.
    Heuson, E.
    Hirshberg, M.
    John, M.
    Lagerstedt, I.
    Mir, S.
    Newman, L. E.
    Oldfield, T. J.
    Patwardhan, A.
    Rinaldi, L.
    Sahni, G.
    Sanz-Garcia, E.
    Sen, S.
    Slowley, R.
    Suarez-Uruena, A.
    Swaminathan, G. J.
    Symmons, M. F.
    Vranken, W. F.
    Wainwright, M.
    Kleywegt, G. J.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D445 - D452
  • [7] The Chemical Translation Service-a web-based tool to improve standardization of metabolomic reports
    Wohlgemuth, Gert
    Haldiya, Pradeep Kumar
    Willighagen, Egon
    Kind, Tobias
    Fiehn, Oliver
    [J]. BIOINFORMATICS, 2010, 26 (20) : 2647 - 2648