Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines

被引:41
作者
Tian, Jian [1 ]
Wu, Ningfeng [1 ]
Guo, Xuexia [2 ]
Guo, Jun [1 ]
Zhang, Juhua [3 ]
Fan, Yunliu [1 ]
机构
[1] Chinese Acad Agr Sci, Biotechnol Res Inst, Beijing 100081, Peoples R China
[2] Acad Planning & Designing, Minist Agr, Agr Byprod Proc Res Inst, Beijing 100026, Peoples R China
[3] Beijing Inst Technol, Dept Biomed Engn, Beijing 100081, Peoples R China
来源
BMC BIOINFORMATICS | 2007年 / 8卷
关键词
MULTIPLE SEQUENCE ALIGNMENT; PROTEIN STABILITY CHANGES; GENE MUTATION DATABASE; ACID INDEX DATABASE; EVOLUTIONARY INFORMATION; MISSENSE MUTATIONS; EXPRESSION DATA; CLASSIFICATION; SNPS; IDENTIFICATION;
D O I
10.1186/1471-2105-8-450
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Human genetic variations primarily result from single nucleotide polymorphisms (SNPs) that occur approximately every 1000 bases in the overall human population. The non-synonymous SNPs (nsSNPs) that lead to amino acid changes in the protein product may account for nearly half of the known genetic variations linked to inherited human diseases. One of the key problems of medical genetics today is to identify nsSNPs that underlie disease-related phenotypes in humans. As such, the development of computational tools that can identify such nsSNPs would enhance our understanding of genetic diseases and help predict the disease. Results: We propose a method, named Parepro (Predicting the amino acid replacement probability), to identify nsSNPs having either deleterious or neutral effects on the resulting protein function. Two independent datasets, HumVar and NewHumVar, taken from the PhD-SNP server, were applied to train the model and test the robustness of Parepro. Using a 20-fold cross validation test on the HumVar dataset, Parepro achieved a Matthews correlation coefficient (MCC) of 50% and an overall accuracy (Q2) of 76%, both of which were higher than those predicted by the methods, such as PolyPhen, SIFT, and HydridMeth. Further analysis on an additional dataset (NewHumVar) using Parepro yielded similar results. Conclusion: The performance of Parepro indicates that it is a powerful tool for predicting the effect of nsSNPs on protein function and would be useful for large-scale analysis of genomic nsSNP data.
引用
收藏
页数:9
相关论文
共 56 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information [J].
Armon, A ;
Graur, D ;
Ben-Tal, N .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (01) :447-463
[3]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[4]   Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J].
Bao, L ;
Cui, Y .
BIOINFORMATICS, 2005, 21 (10) :2185-2190
[5]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[6]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[7]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[8]   Drug Effect Expectancies and Addictive Behavior Change [J].
Brown, Sandra A. .
EXPERIMENTAL AND CLINICAL PSYCHOPHARMACOLOGY, 1993, 1 (1-4) :55-67
[9]   Accurate prediction of the functional significance of single nucleotide polymorphisms and mutations in the ABCA1 gene [J].
Brunham, LR ;
Singaraja, RR ;
Pape, TD ;
Kejariwal, A ;
Thomas, PD ;
Hayden, MR .
PLOS GENETICS, 2005, 1 (06) :739-747
[10]  
Byvatov Evgeny, 2003, Appl Bioinformatics, V2, P67