Computational clustering for viral reference proteomes

被引:3
作者
Chen, Chuming [1 ]
Huang, Hongzhan [1 ]
Mazumder, Raja [2 ]
Natale, Darren A. [3 ]
McGarvey, Peter B. [3 ]
Zhang, Jian [3 ]
Polson, Shawn W. [1 ]
Wang, Yuqi [1 ]
Wu, Cathy H. [1 ,3 ]
机构
[1] Univ Delaware, Ctr Bioinformat & Computat Biol, Newark, DE 19711 USA
[2] George Washington Univ, Dept Biochem & Mol Med, Washington, DC 20037 USA
[3] Georgetown Univ, Med Ctr, Prot Informat Resource, Washington, DC 20007 USA
[4] Swiss Inst Bioinformat, Ctr Med Univ, CH-1211 Geneva 4, Switzerland
基金
美国国家卫生研究院;
关键词
PHAGE;
D O I
10.1093/bioinformatics/btw110
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. Results: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs.
引用
收藏
页码:2041 / 2043
页数:3
相关论文
共 8 条
  • [1] Agarwala R, 2018, NUCLEIC ACIDS RES, V46, pD8, DOI [10.1093/nar/gks1189, 10.1093/nar/gkx1095, 10.1093/nar/gkq1172]
  • [2] UniProt: a hub for protein information
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Apweiler, Rolf
    Alpi, Emanuele
    Antunes, Ricardo
    Arganiska, Joanna
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Gane, Paul
    Cas-tro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightin-gale, Andrew
    Orchard, Sandra
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    Zellner, Hermann
    Cowley, Andrew
    Figueira, Luis
    Li, Weizhong
    McWilliam, Hamish
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D204 - D212
  • [3] Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation
    Chen, Chuming
    Natale, Darren A.
    Finn, Robert D.
    Huang, Hongzhan
    Zhang, Jian
    Wu, Cathy H.
    Mazumder, Raja
    [J]. PLOS ONE, 2011, 6 (04):
  • [4] Understanding the enormous diversity of bacteriophages: The tailed phages that infect the bacterial family Enterobacteriaceae
    Grose, Julianne H.
    Casjens, Sherwood R.
    [J]. VIROLOGY, 2014, 468 : 421 - 443
  • [5] Analysis of the phage sequence space: The benefit of structured information
    Lima-Mendez, Gipsi
    Toussaint, Arlane
    Leplae, Raphael
    [J]. VIROLOGY, 2007, 365 (02) : 241 - 249
  • [6] The Phage Proteomic Tree: a genome-based taxonomy for phage
    Rohwer, F
    Edwards, R
    [J]. JOURNAL OF BACTERIOLOGY, 2002, 184 (16) : 4529 - 4535
  • [7] UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
    Suzek, Baris E.
    Wang, Yuqi
    Huang, Hongzhan
    McGarvey, Peter B.
    Wu, Cathy H.
    [J]. BIOINFORMATICS, 2015, 31 (06) : 926 - 932
  • [8] Update on RefSeq microbial genomes resources
    Tatusova, Tatiana
    Ciufo, Stacy
    Federhen, Scott
    Fedorov, Boris
    McVeigh, Richard
    O'Neill, Kathleen
    Tolstoy, Igor
    Zaslavsky, Leonid
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D599 - D605