Secator: A program for inferring protein subfamilies from phylogenetic trees

被引:62
作者
Wicker, N
Perrin, GR
Thierry, JC
Poch, O
机构
[1] ULP, INSERM, CNRS,Inst Genet & Biol Mol & Cellulaire, Lab Biol & Genom Struct, F-67404 Illkirch Graffenstaden, France
[2] Univ Strasbourg 1, LSIIT ICPS AXE E, CNRS, UPRESA 70005, Illkirch Graffenstaden, France
关键词
Secator; subfamily; phylogenetic tree; clustering;
D O I
10.1093/oxfordjournals.molbev.a003929
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the huge increase of protein data, an important problem is to estimate, within a large protein family, the number of sensible subsets for subsequent in-depth structural, functional, and evolutionary analyses. To tackle this problem, we developed a new program, Secator, which implements the principle of an ascending hierarchical method using a distance matrix based on a multiple alignment of protein sequences. Dissimilarity values assigned to the nodes of a deduced phylogenetic tree are partitioned by a new stopping rule introduced to automatically determine the significant dissimilarity values. The quality of the clusters obtained by Secator is verified by a separate Jackknife study. The method is demonstrated on 24 large protein families covering a wide spectrum of structural and sequence conservation and its usefulness and accuracy with real biological data is illustrated on two well-studied protein families (the Sm proteins and the nuclear receptors).
引用
收藏
页码:1435 / 1441
页数:7
相关论文
共 25 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Auwerx J, 1999, CELL, V97, P161
[3]   Browsing protein families via the 'Rich Family Description' format [J].
Corpet, F ;
Gouzy, J ;
Kahn, D .
BIOINFORMATICS, 1999, 15 (12) :1020-1027
[4]   GeneRAGE: a robust algorithm for sequence clustering and domain detection [J].
Enright, AJ ;
Ouzounis, CA .
BIOINFORMATICS, 2000, 16 (05) :451-457
[5]   THE STEROID AND THYROID-HORMONE RECEPTOR SUPERFAMILY [J].
EVANS, RM .
SCIENCE, 1988, 240 (4854) :889-895
[6]   BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data [J].
Gascuel, O .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (07) :685-695
[7]   Analysis and prediction of functional sub-types from protein sequence alignments [J].
Hannenhalli, SS ;
Russell, RB .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 303 (01) :61-76
[8]  
Hodge T, 2000, J CELL SCI, V113, P3353
[9]   Crystal structures of two Sm protein complexes and their implications for the assembly of the spliceosomal snRNPs [J].
Kambach, C ;
Walke, S ;
Young, R ;
Avis, JM ;
de la Fortelle, E ;
Raker, VA ;
Lührmann, R ;
Li, J ;
Nagai, K .
CELL, 1999, 96 (03) :375-387
[10]   The SYSTERS protein sequence cluster set [J].
Krause, A ;
Stoye, J ;
Vingron, M .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :270-272