With the huge increase of protein data, an important problem is to estimate, within a large protein family, the number of sensible subsets for subsequent in-depth structural, functional, and evolutionary analyses. To tackle this problem, we developed a new program, Secator, which implements the principle of an ascending hierarchical method using a distance matrix based on a multiple alignment of protein sequences. Dissimilarity values assigned to the nodes of a deduced phylogenetic tree are partitioned by a new stopping rule introduced to automatically determine the significant dissimilarity values. The quality of the clusters obtained by Secator is verified by a separate Jackknife study. The method is demonstrated on 24 large protein families covering a wide spectrum of structural and sequence conservation and its usefulness and accuracy with real biological data is illustrated on two well-studied protein families (the Sm proteins and the nuclear receptors).
机构:
EMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, EnglandEMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, England
Enright, AJ
;
Ouzounis, CA
论文数: 0引用数: 0
h-index: 0
机构:
EMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, EnglandEMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, England
机构:
EMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, EnglandEMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, England
Enright, AJ
;
Ouzounis, CA
论文数: 0引用数: 0
h-index: 0
机构:
EMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, EnglandEMBL, European Bioinformat Inst, Res Programme, Computat Gen Grp,Cambridge Outstn, Cambridge CB10 1SD, England