Indel-Seq-Gen: A new protein family simulator incorporating domains, motifs, and indels

被引:23
作者
Strope, Cory L. [1 ]
Scott, Stephen D.
Moriyama, Etsuko N.
机构
[1] Univ Nebraska, Dept Comp Sci & Engn, Lincoln, NE USA
[2] Univ Nebraska, Sch Biol Sci, Lincoln, NE USA
关键词
protein superfamily; sequence simulation; domains; motifs; indels;
D O I
10.1093/molbev/msl195
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.
引用
收藏
页码:640 / 649
页数:10
相关论文
共 36 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 1996, COMPUTER SCI MONOGRA
[3]  
[Anonymous], 1989, Cladistics, DOI DOI 10.1111/J.1096-0031.1989.TB00562.X
[4]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[5]  
Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkp985, 10.1093/nar/gkh121]
[6]   EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS [J].
BENNER, SA ;
COHEN, MA ;
GONNET, GH .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) :1065-1082
[7]   DNA assembly with gaps (Dawg): simulating sequence evolution [J].
Cartwright, RA .
BIOINFORMATICS, 2005, 21 :31-38
[8]   Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments [J].
Chang, MSS ;
Benner, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 341 (02) :617-631
[9]   The CXXC motif: Imperatives for the formation of native disulfide bonds in the cell [J].
Chivers, PT ;
Laboissiere, MCA ;
Raines, RT .
EMBO JOURNAL, 1996, 15 (11) :2659-2667
[10]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245