A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function

被引:109
作者
Krishnan, VG [1 ]
Westhead, DR [1 ]
机构
[1] Univ Leeds, Sch Biochem & Mol Biol, Leeds LS2 9JT, W Yorkshire, England
关键词
D O I
10.1093/bioinformatics/btg297
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The large volume of single nucleotide polymorphism data now available motivates the development of methods for distinguishing neutral changes from those which have real biological effects. Here, two different machine-learning methods, decision trees and support vector machines (SVMs), are applied for the first time to this problem. In common with most other methods, only non-synonymous changes in protein coding regions of the genome are considered. Results: In detailed cross-validation analysis, both learning methods are shown to compete well with existing methods, and to out-perform them in some key tests. SVMs show better generalization performance, but decision trees have the advantage of generating interpretable rules with robust estimates of prediction confidence. It is shown that the inclusion of protein structure information produces more accurate methods, in agreement with other recent studies, and the effect of using predicted rather than actual structure is evaluated.
引用
收藏
页码:2199 / 2209
页数:11
相关论文
共 33 条
  • [1] TEMPERATURE-SENSITIVE MUTATIONS OF BACTERIOPHAGE-T4 LYSOZYME OCCUR AT SITES WITH LOW MOBILITY AND LOW SOLVENT ACCESSIBILITY IN THE FOLDED PROTEIN
    ALBER, T
    SUN, DP
    NYE, JA
    MUCHMORE, DC
    MATTHEWS, BW
    [J]. BIOCHEMISTRY, 1987, 26 (13) : 3754 - 3758
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] [Anonymous], METHOD ENZYMOL
  • [4] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [5] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [6] Single nucleotide polymorphisms ... to a future of genetic medicine
    Chakravarti, A
    [J]. NATURE, 2001, 409 (6822) : 822 - 823
  • [7] Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: Structure-based assessment of amino acid variation
    Chasman, D
    Adams, RM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (02) : 683 - 706
  • [8] Cristianini N, 2000, Intelligent Data Analysis: An Introduction
  • [9] THE HYDROPHOBIC MOMENT DETECTS PERIODICITY IN PROTEIN HYDROPHOBICITY
    EISENBERG, D
    WEISS, RM
    TERWILLIGER, TC
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1984, 81 (01): : 140 - 144
  • [10] Hunt E. B., 1966, EXPT INDUCTION