Dynamic sequence databank searching with templates and multiple alignment

被引:22
作者
Taylor, WR [1 ]
机构
[1] Natl Inst Med Res, Div Math Biol, London NW7 1AA, England
关键词
sequence databank searching; templates; multiple alignment;
D O I
10.1006/jmbi.1998.1853
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequence databank searches are often performed iteratively, taking the results of a search to form a probe (either a pattern or profile) for a subsequent scan of the databank. The advantage of this approach is that, as more sequences are drawn into the probe, it should, in principle be possible to detect increasingly distant members of the family. This approach works well when supervised by an "expert" who has a good "eye" for the quality of the sequence alignment and whether novel matches should be rejected or incorporated into the probe. However, all attempts to automate the process have proved difficult, as the process is inherently unstable. Errors in the alignment, or the misalignment of a non-family member, lead to a deterioration of the probe specificity, so allowing further incorrect sequences to be identified. Here, a combination of two methods is used to provide a check on such instability. A pattern matching (template) search method is used (with a BLAST-like pre-filter for speed) to return sequence segments for alignment in a standard multiple alignment program (MULTAL). Sequences are aligned only to a fixed limit of similarity and any sequences or sub-families that have not joined the original "seed" family are rejected. The remaining core family then provides the basis for a subsequent pattern derivation and databank search. The constant check by the multiple alignment phase allows the search phase to be pushed continually towards the boundary of similarity. This is maintained by lowering the cutoff on the scores of acceptable sequences each time the family remains the same over successive search cycles. The procedure was observed to be stable under misalignments and to have an ability to recognise distantly related family members across super-families that was comparable to Psi-BLAST. The method is applied to the analysis of the hormone-binding domains of the insulin and related growth-factor receptors. (C) 1998 Academic Press.
引用
收藏
页码:375 / 406
页数:32
相关论文
共 24 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], METHOD ENZYMOL
[4]   THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK [J].
BAIROCH, A ;
BOECKMANN, B .
NUCLEIC ACIDS RESEARCH, 1991, 19 :2247-2248
[5]   ON THE TERTIARY STRUCTURE OF THE EXTRACELLULAR DOMAINS OF THE EPIDERMAL GROWTH-FACTOR AND INSULIN-RECEPTORS [J].
BAJAJ, M ;
WATERFIELD, MD ;
SCHLESSINGER, J ;
TAYLOR, WR ;
BLUNDELL, T .
BIOCHIMICA ET BIOPHYSICA ACTA, 1987, 916 (02) :220-226
[6]   FLEXIBLE PROTEIN-SEQUENCE PATTERNS - A SENSITIVE METHOD TO DETECT WEAK STRUCTURAL SIMILARITIES [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 212 (02) :389-402
[7]   DETERMINANTS OF A PROTEIN FOLD - UNIQUE FEATURES OF THE GLOBIN AMINO-ACID-SEQUENCES [J].
BASHFORD, D ;
CHOTHIA, C ;
LESK, AM .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (01) :199-216
[8]  
Dayhoff M.O., 1978, ATLAS PROTEIN SEQ ST, V5
[9]   CLUSTAL - A PACKAGE FOR PERFORMING MULTIPLE SEQUENCE ALIGNMENT ON A MICROCOMPUTER [J].
HIGGINS, DG ;
SHARP, PM .
GENE, 1988, 73 (01) :237-244
[10]   GLOBIN FOLD IN A BACTERIAL TOXIN [J].
HOLM, L ;
SANDER, C .
NATURE, 1993, 361 (6410) :309-309