A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction

被引:20
作者
Spencer, Matt [1 ]
Eickholt, Jesse [2 ]
Cheng, Jianlin [3 ]
机构
[1] Univ Missouri, Inst Informat, Columbia, MO 65211 USA
[2] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA
[3] Univ Missouri, Dept Comp Sci, Columbia, MO 65211 USA
基金
美国国家卫生研究院;
关键词
Machine learning; neural nets; protein structure prediction; deep learning; NEURAL-NETWORKS; GENERATION; ACCURATE;
D O I
10.1109/TCBB.2014.2343960
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q(3) accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.
引用
收藏
页码:103 / 112
页数:10
相关论文
共 54 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2010, MOMENTUM
  • [3] Solving the protein sequence metric problem
    Atchley, WR
    Zhao, JP
    Fernandes, AD
    Drüke, T
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (18) : 6395 - 6400
  • [4] Protein secondary structure prediction for a single-sequence using hidden semi-Markov models
    Aydin, Zafer
    Altunbasak, Yucel
    Borodovsky, Mark
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [5] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [6] SCRATCH: a protein structure and structural feature prediction server
    Cheng, J
    Randall, AZ
    Sweredoski, MJ
    Baldi, P
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W72 - W76
  • [7] PREDICTION OF PROTEIN CONFORMATION
    CHOU, PY
    FASMAN, GD
    [J]. BIOCHEMISTRY, 1974, 13 (02) : 222 - 245
  • [8] The Jpred 3 secondary structure prediction server
    Cole, Christian
    Barber, Jonathan D.
    Barton, Geoffrey J.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : W197 - W201
  • [9] Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO
  • [10] 2-Q