Automated search of natively folded protein fragments for high-throughput structure determination in structural genomics

被引:25
作者
Kuroda, Y
Tani, K
Matsuo, Y
Yokoyama, S
机构
[1] RIKEN, Genom Sci Ctr, Prot Res Grp, Inst Phys & Chem Res, Kanagawa 2300045, Japan
[2] Univ Tokyo, Grad Sch Sci, Dept Biophys & Biochem, Bunkyo Ku, Tokyo 1130033, Japan
关键词
automated procedure; computing tool; PASS; protein structure determination; sequence similarity; stable protein fragment; structural genomics;
D O I
10.1110/ps.9.12.2313
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms. and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether. these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.
引用
收藏
页码:2313 / 2321
页数:9
相关论文
共 41 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]   The SWISS-PROT protein sequence data bank and its new supplement TREMBL [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1996, 24 (01) :21-25
[4]   STRUCTURE OF THE FIBRONECTIN TYPE 1 MODULE [J].
BARON, M ;
NORMAN, D ;
WILLIS, A ;
CAMPBELL, ID .
NATURE, 1990, 345 (6276) :642-646
[5]   GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :15-18
[6]   The PRESAGE database for structural genomics [J].
Brenner, SE ;
Barken, D ;
Levitt, M .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :251-253
[7]   Structural genomics: beyond the Human Genome Project [J].
Burley, SK ;
Almo, SC ;
Bonanno, JB ;
Capel, M ;
Chance, MR ;
Gaasterland, T ;
Lin, DW ;
Sali, A ;
Studier, FW ;
Swaminathan, S .
NATURE GENETICS, 1999, 23 (02) :151-157
[8]   BUILDING PROTEIN-STRUCTURE AND FUNCTION FROM MODULAR UNITS [J].
CAMPBELL, ID ;
DOWNING, AK .
TRENDS IN BIOTECHNOLOGY, 1994, 12 (05) :168-172
[9]  
Clore GM, 1997, NAT STRUCT BIOL, V4, P849
[10]   The ProDom database of protein domain families [J].
Corpet, F ;
Gouzy, J ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :323-326