Automated protein sequence database classification. II. Delineation of domain boundaries from sequence similarities

被引：43

作者：

Gracy, J ^{[1
]}

Argos, P ^{[1
]}

机构：

[1] European Mol Biol Lab, D-69012 Heidelberg, Germany

来源：

BIOINFORMATICS | 1998年 / 14卷 / 02期

关键词：

D O I：

10.1093/bioinformatics/14.2.174

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Decomposing each protein into modular domains is a basic prerequisite to classify accurately structural units in biological molecules. Boundaries between domains are indicated by two similar- amino acid sequence segments located within the same protein (repeats) ol within homologous proteins at notably different distances from their respective N- or C-termini. Results: We have developed an automated method that combines such positional constraints derived from various detected pairwise sequence similarities to delineate the modular organization of proteins. The procedure has been applied to a non-redundant data set of 26 990 proteins whose sequences were taken from the PIR and SWISS-PROT databanks and shared <60% sequence identity amongst pairs. The resultant clustering, delineation and multiple alignment of 24 380 sequence fragments yielded a new database of 4364 domain families. Comparison of the domain collection with that of PRODOM indicates a clear improvement in the number and size of domain families, domain boundaries and multiple sequence alignments. The accuracy and sensitivity of the method are illustrated by results obtained for ankyrin-like repeats and EGF-like modules.

引用

页码：174 / 187

页数：14

共 17 条

[1] PRINTS - A PROTEIN MOTIF FINGERPRINT DATABASE
ATTWOOD, TK
BECK, ME
[J]. PROTEIN ENGINEERING, 1994, 7 (07): : 841 - 848
[2] The SWISS-PROT protein sequence data bank and its new supplement TREMBL
Bairoch, A
Apweiler, R
[J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (01) : 21 - 25
[3] The PROSITE database, its status in 1995
Bairoch, A
Bucher, P
Hofmann, K
[J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (01) : 189 - 196
[4] HUNDREDS OF ANKYRIN-LIKE REPEATS IN FUNCTIONALLY DIVERSE PROTEINS - MOBILE MODULES THAT CROSS PHYLA HORIZONTALLY
BORK, P
[J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1993, 17 (04): : 363 - 374
[5] BORK P, 1995, TRENDS BIOCH SCI, V20
[6] DAVIS C G, 1990, New Biologist, V2, P410
[7] Etzold T, 1996, METHOD ENZYMOL, V266, P114
[8] The PIR-International protein sequence database
George, DG
Barker, WC
Mewes, HW
Pfeiffer, F
Tsugita, A
[J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (01) : 17 - 20
[9] Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment
Gracy, J
Argos, P
[J]. BIOINFORMATICS, 1998, 14 (02) : 164 - 173
[10] AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS
HENIKOFF, S
HENIKOFF, JG
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) : 10915 - 10919

← 1 2 →