Finding new structural and sequence attributes to predict possible disease association of single amino acid lpolymorphism (SAP)

被引:50
作者
Ye, Zhi-Qiang
Zhao, Shu-Qi
Gao, Ge
Liu, Xiao-Qiao
Langlois, Robert E.
Lu, Hui
Wei, Liping [1 ]
机构
[1] Peking Univ, Coll Life Sci, Ctr Bioinformat, Natl Lab Prot Engn & Plant Genet Engn, Beijing 100871, Peoples R China
[2] Univ Illinois, Dept Bioengn, Bioinformat Program, Chicago, IL 60607 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btm119
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The rapid accumulation of single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), brings the opportunities and needs to understand and predict their disease association. Currently published attributes are limited, the detailed mechanisms governing the disease association of a SAP remain unclear and thus, further investigation of new attributes and improvement of the prediction Eire desired. Results: A SAP dataset was compiled from the Swiss-Prot variant pages. We extracted and demonstrated the effectiveness of several new biologically informative attributes including the structural neighbor profiles that describe the SAP's microenvironment, nearby functional sites that measure the structure-based and sequence-based distances between the SAP site and its nearby functional sites, aggregation properties that measure the likelihood of protein aggregation and disordered regions that consider whether the SAP is located in structurally disordered regions. The new attributes provided insights into the mechanisms of the disease association of SAPs. We built a support vector machines (SVMs) classifier employing a carefully selected set of new and previously published attributes. Through a strict protein-level 5-fold cross-validation, we attained an overall accuracy of 82.61 %, and an MCC of 0.60. Moreover, a web server was developed to provide a User-friendly interface for biologists.
引用
收藏
页码:1444 / 1450
页数:7
相关论文
共 41 条
[1]   nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms [J].
Bao, L ;
Zhou, M ;
Cui, Y .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W480-W482
[2]   Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J].
Bao, L ;
Cui, Y .
BIOINFORMATICS, 2005, 21 (10) :2185-2190
[3]   Kernel-based machine learning protocol for predicting DNA-binding proteins [J].
Bhardwaj, N ;
Langlois, RE ;
Zhao, GJ ;
Lu, H .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :6486-6493
[4]   Structural bioinformatics prediction of membrane-binding proteins [J].
Bhardwaj, Nitin ;
Stahelin, Robert V. ;
Langlois, Robert E. ;
Cho, Wonhwa ;
Lu, Hui .
JOURNAL OF MOLECULAR BIOLOGY, 2006, 359 (02) :486-495
[5]   Bayesian approach to discovering pathogenic SNPs in conserved protein domains [J].
Cai, ZH ;
Tsung, EF ;
Marinescu, VD ;
Ramoni, MF ;
Riva, A ;
Kohane, IS .
HUMAN MUTATION, 2004, 24 (02) :178-184
[6]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[7]   Studies of the aggregation of mutant proteins in vitro provide insights into the genetics of amyloid diseases [J].
Chiti, F ;
Calamai, M ;
Taddei, N ;
Stefani, M ;
Ramponi, G ;
Dobson, CM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 :16419-16426
[8]   Predicting deleterious nsSNPs: an analysis of sequence and structural attributes [J].
Dobson, Richard J. ;
Munroe, Patricia B. ;
Caulfield, Mark J. ;
Saqi, Mansoor A. S. .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   The protein trinity - linking function and disorder [J].
Dunker, AK ;
Obradovic, Z .
NATURE BIOTECHNOLOGY, 2001, 19 (09) :805-806
[10]   Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins [J].
Fernandez-Escamilla, AM ;
Rousseau, F ;
Schymkowitz, J ;
Serrano, L .
NATURE BIOTECHNOLOGY, 2004, 22 (10) :1302-1306