CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway

被引:43
作者
Zhou, Jiyun [1 ,2 ]
Wang, Hongpeng [1 ]
Zhao, Zhishan [1 ]
Xu, Ruifeng [1 ]
Lu, Qin [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen Grad Sch, HIT Campus Shenzhen Univ Town, Shenzhen 518055, Guangdong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein secondary structure; Convolutional neural network; Highway; Local context; Long-range interdependency; SUPPORT VECTOR MACHINES; SEQUENCE; IDENTIFICATION; RECOGNITION; INFORMATION; ACCURACY;
D O I
10.1186/s12859-018-2067-8
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. Results: We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. Conclusion: CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.
引用
收藏
页数:11
相关论文
共 52 条
[1]
[Anonymous], J CHEM INF MODEL
[2]
[Anonymous], 2013, COMPUTER SCI
[3]
[Anonymous], LANE MED LECT PROTEI
[4]
[Anonymous], 2011, Proceedings of the fifteenth conference on computational natural language learning
[5]
[Anonymous], 2014, COMPUTER SCI
[6]
[Anonymous], PROTEIN SECONDARY ST
[7]
[Anonymous], 2004, International Conference on Machine Learning (ICML-04)
[8]
[Anonymous], 2013, AD VANCES NEURAL INF
[9]
Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[10]
The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242