A machine learning information retrieval approach to protein fold recognition

被引：152

作者：

Cheng, Jianlin ^{[1
]}

Baldi, Pierre ^{[1
]}

机构：

[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92697 USA

来源：

BIOINFORMATICS | 2006年 / 22卷 / 12期

关键词：

D O I：

10.1093/bioinformatics/btl102

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequence-structure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition-finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable and effective. Compared with 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is similar to 85, 56, and 27% at the family, superfamily and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90, 70, and 48%.

引用

页码：1456 / 1463

页数：8

共 85 条

[1] RECOGNITION OF DISTANTLY RELATED PROTEINS THROUGH ENERGY CALCULATIONS [J].

ABAGYAN, R ;

FRISHMAN, D ;

ARGOS, P .

PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1994, 19 (02) :132-140

[2] Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases [J].

Al-Lazikani, B ;

Sheinerman, FB ;

Honig, B .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :14796-14801

[3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].

Altschul, SF ;

Madden, TL ;

Schaffer, AA ;

Zhang, JH ;

Zhang, Z ;

Miller, W ;

Lipman, DJ .

NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402

[4] BASIC LOCAL ALIGNMENT SEARCH TOOL [J].

ALTSCHUL, SF ;

GISH, W ;

MILLER, W ;

MYERS, EW ;

LIPMAN, DJ .

JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410

[5]

[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026

[6] Score distributions for simultaneous matching to multiple motifs [J].

Bailey, TL ;

Gribskov, M .

JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (01) :45-59

[7] HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION [J].

BALDI, P ;

CHAUVIN, Y ;

HUNKAPILLER, T ;

MCCLURE, MA .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) :1059-1063

[8] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].

BOWIE, JU ;

LUTHY, R ;

EISENBERG, D .

SCIENCE, 1991, 253 (5016) :164-170

[9] AN EMPIRICAL ENERGY FUNCTION FOR THREADING PROTEIN-SEQUENCE THROUGH THE FOLDING MOTIF [J].

BRYANT, SH ;

LAWRENCE, CE .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1993, 16 (01) :92-112

[10] Anger expression toward parents and depressive symptoms among undergraduates in Taiwan [J].

Cheng, HL ;

Mallinckrodt, B ;

Wu, LC .

COUNSELING PSYCHOLOGIST, 2005, 33 (01) :72-97

← 1 2 3 4 5 6 7 8 9 →