A Unified Multitask Architecture for Predicting Local Protein Properties

被引：31

作者：

Qi, Yanjun ^{[1
]}

Oja, Merja ^{[2
]}

Weston, Jason ^{[3
]}

Noble, William Stafford ^{[2
]}

机构：

[1] NEC Labs Amer, Machine Learning Dept, Princeton, NJ USA

[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

[3] Google, New York, NY USA

来源：

PLOS ONE | 2012年 / 7卷 / 03期

基金：

芬兰科学院;

关键词：

COMBINED TRANSMEMBRANE TOPOLOGY; SEQUENCE ALIGNMENT PROFILES; SIGNAL PEPTIDE PREDICTION; SECONDARY STRUCTURE; SOLVENT ACCESSIBILITY; COILED COILS; INTERACTION SITES; BINDING RESIDUES; IDENTIFICATION; DATABASE;

D O I：

10.1371/journal.pone.0032235

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.

引用

页数：11

共 59 条

[1] Combining prediction of secondary structure and solvent accessibility in proteins
Adamczak, R
Porollo, A
Meller, J
[J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 59 (03) : 467 - 475
[2] PSSM-based prediction of DNA binding sites in proteins
Ahmad, S
Sarai, A
[J]. BMC BIOINFORMATICS, 2005, 6 (1)
[3] Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information
Ahmad, S
Gromiha, MM
Sarai, A
[J]. BIOINFORMATICS, 2004, 20 (04) : 477 - 486
[4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Altschul, SF
Madden, TL
Schaffer, AA
Zhang, JH
Zhang, Z
Miller, W
Lipman, DJ
[J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
[5] [Anonymous], 1987, COMPUT SPEECH LANG
[6] CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information
Bartoli, Lisa
Fariselli, Piero
Krogh, Anders
Casadio, Rita
[J]. BIOINFORMATICS, 2009, 25 (21) : 2757 - 2763
[7] Bassnett S, 2009, MOL VIS, V15, P2448
[8] Improved prediction of signal peptides: SignalP 3.0
Bendtsen, JD
Nielsen, H
von Heijne, G
Brunak, S
[J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (04) : 783 - 795
[9] A neural probabilistic language model
Bengio, Y
Ducharme, R
Vincent, P
Jauvin, C
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
[10] PREDICTING COILED COILS BY USE SF PAIRWISE RESIDUE CORRELATIONS
BERGER, B
WILSON, DB
WOLF, E
TONCHEV, T
MILLA, M
KIM, PS
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (18) : 8259 - 8263

← 1 2 3 4 5 6 →