WS-SNPs& GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

被引:268
作者
Capriotti, Emidio [1 ]
Calabrese, Remo [2 ]
Fariselli, Piero [3 ]
Martelli, Pier Luigi [4 ]
Altman, Russ B. [5 ,6 ]
Casadio, Rita [4 ]
机构
[1] Univ Alabama Birmingham, Dept Pathol, Div Informat, Birmingham, AL 35294 USA
[2] S IN Soluz Informat Srl, I-36100 Vicenza, Italy
[3] Univ Bologna, Dept Comp Sci, I-40126 Bologna, Italy
[4] Univ Bologna, Dept Biol, Lab Biocomp, I-40126 Bologna, Italy
[5] Stanford Univ, Dept Bioengn, Stanford, CA 94305 USA
[6] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
来源
BMC GENOMICS | 2013年 / 14卷
关键词
DISEASE-RELATED MUTATIONS; STABILITY CHANGES; DATABASE SEARCH; GENE ONTOLOGY; BIOINFORMATICS; POLYMORPHISMS; INFORMATION; POTENTIALS; SEQUENCE; IMPROVES;
D O I
10.1186/1471-2164-14-S3-S6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs& GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results: The server consists of two main components, including updated versions of the sequence-based SNPs& GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO(3d) programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of similar to 6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions: WS-SNPs& GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs& GO is freely available at http://snps.biofold.org/snps-and-go.
引用
收藏
页数:7
相关论文
共 40 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes [J].
Boyle, EI ;
Weng, SA ;
Gollub, J ;
Jin, H ;
Botstein, D ;
Cherry, JM ;
Sherlock, G .
BIOINFORMATICS, 2004, 20 (18) :3710-3715
[5]   SNAP: predict effect of non-synonymous polymorphisms on function [J].
Bromberg, Yana ;
Rost, Burkhard .
NUCLEIC ACIDS RESEARCH, 2007, 35 (11) :3823-3835
[6]   MView: a web-compatible database search or multiple alignment viewer [J].
Brown, NP ;
Leroy, C ;
Sander, C .
BIOINFORMATICS, 1998, 14 (04) :380-381
[7]   Functional Annotations Improve the Predictive Score of Human Disease-Related Mutations in Proteins [J].
Calabrese, Remo ;
Capriotti, Emidio ;
Fariselli, Piero ;
Martelli, Pier Luigi ;
Casadio, Rita .
HUMAN MUTATION, 2009, 30 (08) :1237-1244
[8]   Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information [J].
Capriotti, E. ;
Calabrese, R. ;
Casadio, R. .
BIOINFORMATICS, 2006, 22 (22) :2729-2734
[9]   I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure [J].
Capriotti, E ;
Fariselli, P ;
Casadio, R .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W306-W310
[10]   Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans [J].
Capriotti, Emidio ;
Arbiza, Leonardo ;
Casadio, Rita ;
Dopazo, Joaquin ;
Dopazo, Hernan ;
Marti-Renom, Marc A. .
HUMAN MUTATION, 2008, 29 (01) :198-204