ProtoMap: automatic classification of protein sequences and hierarchy of protein families

被引:108
作者
Yona, G
Linial, N
Linial, M
机构
[1] Stanford Univ, Dept Biol Struct, Stanford, CA 94305 USA
[2] Hebrew Univ Jerusalem, Inst Comp Sci, IL-91904 Jerusalem, Israel
[3] Hebrew Univ Jerusalem, Inst Life Sci, Dept Biol Chem, IL-91904 Jerusalem, Israel
关键词
D O I
10.1093/nar/28.1.49
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences, The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together, The classification is done at different levels of confidence, and yields a hierarchical organization of all:proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il.
引用
收藏
页码:49 / 55
页数:7
相关论文
共 23 条
  • [1] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [2] PRINTS-S: the database formerly known as PRINTS
    Attwood, TK
    Croning, MDR
    Flower, DR
    Lewis, AP
    Mabey, JE
    Scordis, P
    Selley, JN
    Wright, W
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 225 - 227
  • [3] PRINTS prepares for the new millennium
    Attwood, TK
    Flower, DR
    Lewis, AP
    Mabey, JE
    Morgan, SR
    Scordis, P
    Selley, JN
    Wright, W
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 220 - 225
  • [4] The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 49 - 54
  • [5] Barker WC, 1996, METHOD ENZYMOL, V266, P59
  • [6] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [7] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [8] ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons
    Corpet, F
    Servant, F
    Gouzy, J
    Kahn, D
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 267 - 269
  • [9] Recent improvements of the ProDom database of protein domain families
    Corpet, F
    Gouzy, J
    Kahn, D
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 263 - 267
  • [10] Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment
    Gracy, J
    Argos, P
    [J]. BIOINFORMATICS, 1998, 14 (02) : 164 - 173