Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations

被引:197
作者
Henikoff, S [1 ]
Henikoff, JG [1 ]
Pietrokovski, S [1 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Howard Hughes Med Inst, Seattle, WA 98109 USA
关键词
D O I
10.1093/bioinformatics/15.6.471
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: As databanks grow, sequence classification and prediction of function by searching protein family databases becomes increasingly valuable. The original Blocks Database, which contains ungapped multiple alignments for families documented in PROSITE, can be searched to classify new sequences. However PROSITE is incomplete, and families from other databases are now available to expand coverage of the Blocks Database. Results: To take advantage of protein family information present in several existing compilations, we have used five databases to construct Blocks+, a unified database that is built on the PROTOMAT/BLOSUM scoring model and that can be searched using a single algorithm for consistent sequence classification. The LAMA blocks-versus-blocks searching program identifies overlapping protein families, making possible a non-redundant hierarchical compilation. Blocks+ consists of all blocks derived from PROSITE, blocks from Prints not present in PROSITE, blocks from Pfam-A not present in PROSITE or Prints, and so on for ProDom and Demo, for a total of 1995 protein families represented by 8909 blocks, doubling the coverage of the original Blocks Database. A challenge for any procedure aimed at non-redundancy is to retain related but distinct families while discarding those that are duplicates. We illustrate how using multiple compilations can minimize this potential problem by examining the SNF2 family of ATPases, which is detectably similar to distinct families of helicases and ATPases.
引用
收藏
页码:471 / 479
页数:9
相关论文
共 29 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   PRINTS - A PROTEIN MOTIF FINGERPRINT DATABASE [J].
ATTWOOD, TK ;
BECK, ME .
PROTEIN ENGINEERING, 1994, 7 (07) :841-848
[4]   PRINTS prepares for the new millennium [J].
Attwood, TK ;
Flower, DR ;
Lewis, AP ;
Mabey, JE ;
Morgan, SR ;
Scordis, P ;
Selley, JN ;
Wright, W .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :220-225
[5]   Molecular analysis of the SNF2/SWI2 protein family member MOT1, an ATP-driven enzyme that dissociates TATA-binding protein from DNA [J].
Auble, DT ;
Wang, DY ;
Post, KW ;
Hahn, S .
MOLECULAR AND CELLULAR BIOLOGY, 1997, 17 (08) :4842-4851
[6]   Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Finn, RD ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :260-262
[7]   AN EXPANDING FAMILY OF HELICASES WITHIN THE DEAD/H SUPERFAMILY [J].
BORK, P ;
KOONIN, EV .
NUCLEIC ACIDS RESEARCH, 1993, 21 (03) :751-752
[8]   Recent improvements of the ProDom database of protein domain families [J].
Corpet, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :263-267
[9]   STIMULATION OF GAL4 DERIVATIVE BINDING TO NUCLEOSOMAL DNA BY THE YEAST SWI/SNF COMPLEX [J].
COTE, J ;
QUINN, J ;
WORKMAN, JL ;
PETERSON, CL .
SCIENCE, 1994, 265 (5168) :53-60
[10]   EXHAUSTIVE MATCHING OF THE ENTIRE PROTEIN-SEQUENCE DATABASE [J].
GONNET, GH ;
COHEN, MA ;
BENNER, SA .
SCIENCE, 1992, 256 (5062) :1443-1445