A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function

被引:109
作者
Krishnan, VG [1 ]
Westhead, DR [1 ]
机构
[1] Univ Leeds, Sch Biochem & Mol Biol, Leeds LS2 9JT, W Yorkshire, England
关键词
D O I
10.1093/bioinformatics/btg297
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The large volume of single nucleotide polymorphism data now available motivates the development of methods for distinguishing neutral changes from those which have real biological effects. Here, two different machine-learning methods, decision trees and support vector machines (SVMs), are applied for the first time to this problem. In common with most other methods, only non-synonymous changes in protein coding regions of the genome are considered. Results: In detailed cross-validation analysis, both learning methods are shown to compete well with existing methods, and to out-perform them in some key tests. SVMs show better generalization performance, but decision trees have the advantage of generating interpretable rules with robust estimates of prediction confidence. It is shown that the inclusion of protein structure information produces more accurate methods, in agreement with other recent studies, and the effect of using predicted rather than actual structure is evaluated.
引用
收藏
页码:2199 / 2209
页数:11
相关论文
共 33 条
  • [11] LICINIO J, 2002, PHARMACOGENOMICS
  • [12] COMPLETE MUTAGENESIS OF THE HIV-1 PROTEASE
    LOEB, DD
    SWANSTROM, R
    EVERITT, L
    MANCHESTER, M
    STAMPER, SE
    HUTCHISON, CA
    [J]. NATURE, 1989, 340 (6232) : 397 - 400
  • [13] GENETIC-STUDIES OF THE LAC REPRESSOR .14. ANALYSIS OF 4000 ALTERED ESCHERICHIA-COLI LAC REPRESSORS REVEALS ESSENTIAL AND NONESSENTIAL RESIDUES, AS WELL AS SPACERS WHICH DO NOT REQUIRE A SPECIFIC SEQUENCE
    MARKIEWICZ, P
    KLEINA, LG
    CRUZ, C
    EHRET, S
    MILLER, JH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1994, 240 (05) : 421 - 433
  • [14] Mitchell M. T., 1997, MACHINE LEARNING
  • [15] Predicting deleterious amino acid substitutions
    Ng, PC
    Henikoff, S
    [J]. GENOME RESEARCH, 2001, 11 (05) : 863 - 874
  • [16] IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON
    PEARSON, WR
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) : 2444 - 2448
  • [17] Quinlan R, 1993, C4.5: Programs for Machine Learning
  • [18] Human non-synonymous SNPs: server and survey
    Ramensky, V
    Bork, P
    Sunyaev, S
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (17) : 3894 - 3900
  • [19] SYSTEMATIC MUTATION OF BACTERIOPHAGE-T4 LYSOZYME
    RENNELL, D
    BOUVIER, SE
    HARDY, LW
    POTEETE, AR
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1991, 222 (01) : 67 - 87
  • [20] CONSERVATION AND PREDICTION OF SOLVENT ACCESSIBILITY IN PROTEIN FAMILIES
    ROST, B
    SANDER, C
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1994, 20 (03) : 216 - 226