Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information

被引:109
作者
Bao, L [1 ]
Cui, Y [1 ]
机构
[1] Univ Tennessee, Ctr Hlth Sci, Dept Mol Sci, Ctr Genom & Bioinformat, Memphis, TN 38163 USA
关键词
D O I
10.1093/bioinformatics/bti365
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs. Results: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not < 10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (< 10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available.
引用
收藏
页码:2185 / 2190
页数:6
相关论文
共 35 条
  • [31] Random forest: A classification and regression tool for compound classification and QSAR modeling
    Svetnik, V
    Liaw, A
    Tong, C
    Culberson, JC
    Sheridan, RP
    Feuston, BP
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2003, 43 (06): : 1947 - 1958
  • [32] Vapnik V.N, 1998, Statistical learning theory
  • [33] SNPs, protein structure, and disease
    Wang, Z
    Moult, J
    [J]. HUMAN MUTATION, 2001, 17 (04) : 263 - 270
  • [34] Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data
    Wu, BL
    Abbott, T
    Fishman, D
    McMurray, W
    Mor, G
    Stone, K
    Ward, D
    Williams, K
    Zhao, HY
    [J]. BIOINFORMATICS, 2003, 19 (13) : 1636 - 1643
  • [35] Zhou XH, 2002, STAT METHODS DIAGNOS, DOI DOI 10.1002/9780470317082