Predicting the Functional Effect of Amino Acid Substitutions and Indels

被引:1252
作者
Choi, Yongwook [1 ]
Sims, Gregory E. [1 ]
Murphy, Sean [1 ]
Miller, Jason R. [1 ]
Chan, Agnes P. [1 ]
机构
[1] J Craig Venter Inst, Rockville, MD USA
来源
PLOS ONE | 2012年 / 7卷 / 10期
基金
美国国家卫生研究院;
关键词
SINGLE-NUCLEOTIDE POLYMORPHISMS; PROTEIN FUNCTION; CYSTIC-FIBROSIS; MUTATION; SEQUENCE; IDENTIFICATION; PHENOTYPE; DATABASE; CAPTURE; SEARCH;
D O I
10.1371/journal.pone.0046688
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is similar to 0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.
引用
收藏
页数:13
相关论文
共 34 条
[21]   Predicting deleterious amino acid substitutions [J].
Ng, PC ;
Henikoff, S .
GENOME RESEARCH, 2001, 11 (05) :863-874
[22]   Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome [J].
Ng, Sarah B. ;
Bigham, Abigail W. ;
Buckingham, Kati J. ;
Hannibal, Mark C. ;
McMillin, Margaret J. ;
Gildersleeve, Heidi I. ;
Beck, Anita E. ;
Tabor, Holly K. ;
Cooper, Gregory M. ;
Mefford, Heather C. ;
Lee, Choli ;
Turner, Emily H. ;
Smith, Joshua D. ;
Rieder, Mark J. ;
Yoshiura, Koh-ichiro ;
Matsumoto, Naomichi ;
Ohta, Tohru ;
Niikawa, Norio ;
Nickerson, Deborah A. ;
Bamshad, Michael J. ;
Shendure, Jay .
NATURE GENETICS, 2010, 42 (09) :790-U85
[23]   Exome sequencing identifies the cause of a mendelian disorder [J].
Ng, Sarah B. ;
Buckingham, Kati J. ;
Lee, Choli ;
Bigham, Abigail W. ;
Tabor, Holly K. ;
Dent, Karin M. ;
Huff, Chad D. ;
Shannon, Paul T. ;
Jabs, Ethylin Wang ;
Nickerson, Deborah A. ;
Shendure, Jay ;
Bamshad, Michael J. .
NATURE GENETICS, 2010, 42 (01) :30-U41
[24]   Targeted capture and massively parallel sequencing of 12 human exomes [J].
Ng, Sarah B. ;
Turner, Emily H. ;
Robertson, Peggy D. ;
Flygare, Steven D. ;
Bigham, Abigail W. ;
Lee, Choli ;
Shaffer, Tristan ;
Wong, Michelle ;
Bhattacharjee, Arindam ;
Eichler, Evan E. ;
Bamshad, Michael ;
Nickerson, Deborah A. ;
Shendure, Jay .
NATURE, 2009, 461 (7261) :272-U153
[25]   Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations [J].
O'Roak, Brian J. ;
Deriziotis, Pelagia ;
Lee, Choli ;
Vives, Laura ;
Schwartz, Jerrod J. ;
Girirajan, Santhosh ;
Karakoc, Emre ;
MacKenzie, Alexandra P. ;
Ng, Sarah B. ;
Baker, Carl ;
Rieder, Mark J. ;
Nickerson, Deborah A. ;
Bernier, Raphael ;
Fisher, Simon E. ;
Shendure, Jay ;
Eichler, Evan E. .
NATURE GENETICS, 2011, 43 (06) :585-U125
[26]   Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype:: Lessons from recent developments in the IARC TP53 database [J].
Petitjean, Audrey ;
Mathe, Ewy ;
Kato, Shunsuke ;
Ishioka, Chikashi ;
Tavtigian, Sean V. ;
Hainaut, Pierre ;
Olivier, Magali .
HUMAN MUTATION, 2007, 28 (06) :622-629
[27]   Predicting the functional impact of protein mutations: application to cancer genomics [J].
Reva, Boris ;
Antipin, Yevgeniy ;
Sander, Chris .
NUCLEIC ACIDS RESEARCH, 2011, 39 (17) :E118-U85
[28]   Human gene mutation database (HGMD®):: 2003 update [J].
Stenson, PD ;
Ball, EV ;
Mort, M ;
Phillips, AD ;
Shiel, JA ;
Thomas, NST ;
Abeysinghe, S ;
Krawczak, M ;
Cooper, DN .
HUMAN MUTATION, 2003, 21 (06) :577-581
[29]   The Human Gene Mutation Database: 2008 update [J].
Stenson, Peter D. ;
Mort, Matthew ;
Ball, Edward V. ;
Howells, Katy ;
Phillips, Andrew D. ;
Thomas, Nick S. T. ;
Cooper, David N. .
GENOME MEDICINE, 2009, 1
[30]   Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity [J].
Stone, EA ;
Sidow, A .
GENOME RESEARCH, 2005, 15 (07) :978-986