Distribution of indel lengths

被引:62
作者
Qian, B
Goldstein, RA [1 ]
机构
[1] Univ Michigan, Dept Chem, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Biophys Res Div, Ann Arbor, MI 48109 USA
来源
PROTEINS-STRUCTURE FUNCTION AND GENETICS | 2001年 / 45卷 / 01期
关键词
sequence alignment; insertion and deletion; gaps; protein evolution; dynamic programming;
D O I
10.1002/prot.1129
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence alignment has become a widely used method in the study of newly sequenced proteins. Most sequence alignment methods use an affine gap penalty to assign scores to insertions and deletions. Although affine gap penalties represent the relative ease of extending a gap compared with initializing a gap, it is still an obvious oversimplification of the real processes that occur during sequence evolution. To improve the efficiency of sequence alignment methods and to obtain a better understanding of the process of sequence evolution, we wanted to find a more accurate model of insertions and deletions in homologous proteins. In this work, we extract the probability of a gap occurrence and the resulting gap length distribution in distantly related proteins (sequence identity < 25%) using alignments based on their common structures. We observe a distribution of gaps that can be fitted with a multiexponential with four distinct components. The results suggest new approaches to modeling insertions and deletions in sequence alignments. (C) 2001 Wiley-Liss, Inc.
引用
收藏
页码:102 / 104
页数:3
相关论文
共 12 条
[1]  
Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]  
Gelman A, 2013, BAYESIAN DATA ANAL, DOI DOI 10.1201/9780429258411
[5]   EXHAUSTIVE MATCHING OF THE ENTIRE PROTEIN-SEQUENCE DATABASE [J].
GONNET, GH ;
COHEN, MA ;
BENNER, SA .
SCIENCE, 1992, 256 (5062) :1443-1445
[6]   A systematic comparison of protein structure classifications: SCOP, CATH and FSSP [J].
Hadley, C ;
Jones, DT .
STRUCTURE WITH FOLDING & DESIGN, 1999, 7 (09) :1099-1112
[7]   Mapping the protein universe [J].
Holm, L ;
Sander, C .
SCIENCE, 1996, 273 (5275) :595-602
[8]  
Kann M, 2000, PROTEINS, V41, P498, DOI 10.1002/1097-0134(20001201)41:4<498::AID-PROT70>3.0.CO
[9]  
2-3
[10]  
Li W.-H, 1991, FUNDAMENTALS MOL EVO