PROTEIN FAMILY CLASSIFICATION BASED ON SEARCHING A DATABASE OF BLOCKS

被引:328
作者
HENIKOFF, S
HENIKOFF, JG
机构
[1] Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
[2] Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle
关键词
D O I
10.1006/geno.1994.1018
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The most highly conserved regions of proteins can be represented as ''blocks'' of locally aligned sequence segments. Previously, an automated system was introduced to generate a database of blocks that is searched for local similarities using a sequence query. Here, we describe a method for searching this database that can also reveal significant global similarities. Local and global alignments are scored independently, so they can be used in concert to infer homology. A set of 7082 diverse sequences not represented in the database provided queries for testing this approach. The resulting distributions of scores led to guidelines for interpretation of search data and to the classification of 289 uncatalogued sequences into known groups. Thirty-eight of these relationships appear to be new discoveries. We also show how searching a database of blocks can be used to detect repeated domains and to find distinct cross-family relationships that were missed in searches of sequence databases. (C) 1994 Academic Press, Inc.
引用
收藏
页码:97 / 107
页数:11
相关论文
共 44 条