RUPEE: A fast and accurate purely geometric protein structure search

被引:28
作者
Ayoub, Ronald [1 ]
Lee, Yugyung [1 ]
机构
[1] Univ Missouri, Sch Comp & Engn, Kansas City, MO 64110 USA
来源
PLOS ONE | 2019年 / 14卷 / 03期
关键词
STRUCTURE ALIGNMENT; SECONDARY-STRUCTURE; ALGORITHM; TOOL;
D O I
10.1371/journal.pone.0213712
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Given the close relationship between protein structure and function, protein structure searches have long played an established role in bioinformatics. Despite their maturity, existing protein structure searches either use simplifying assumptions or compromise between fast response times and quality of results. These limitations can prevent the easy and efficient exploration of relationships between protein structures, which is the norm in other areas of inquiry. To address these limitations we have developed RUPEE, a fast and accurate purely geometric structure search combining techniques from information retrieval and big data with a novel approach to encoding sequences of torsion angles. Comparing our results to the output of mTM, SSM, and the CATHEDRAL structural scan, it is clear that RUPEE has set a new bar for purely geometric big data approaches to protein structure searches. RUPEE in top-aligned mode produces equal or better results than the best available protein structure searches, and RUPEE in fast mode demonstrates the fastest response times coupled with high quality results. The RUPEE protein structure search is available at https://ayoubresearoh.com. Code and data are available at https://github.com/rayoub/rupee.
引用
收藏
页数:17
相关论文
共 37 条
[1]   SISYPHUS - structural alignments for proteins with non-trivial relationships [J].
Andreeva, Antonina ;
Prlic, Andreas ;
Hubbard, Tim J. P. ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D253-D259
[2]  
[Anonymous], 2003, Bioinformatics, DOI DOI 10.1093/BIOINFORMATICS/BTG1086
[3]   Rapid 3D protein structure database searching using information retrieval techniques [J].
Aung, Z ;
Tan, KL .
BIOINFORMATICS, 2004, 20 (07) :1045-1052
[4]  
Ayoub R, 2017, IEEE INT C BIOINFORM, P74, DOI 10.1109/BIBM.2017.8217627
[5]  
Broder A. Z., 1998, Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, P327, DOI 10.1145/276698.276781
[6]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29
[7]   Syntactic clustering of the Web [J].
Broder, AZ ;
Glassman, SC ;
Manasse, MS ;
Zweig, G .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1997, 29 (8-13) :1157-1166
[8]   FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately [J].
Budowski-Tal, Inbal ;
Nov, Yuval ;
Kolodny, Rachel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (08) :3481-3486
[9]   YAKUSA: A fast structural database scanning method [J].
Carpentier, M ;
Brouillet, S ;
Pothier, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (01) :137-151
[10]   ECOD: An Evolutionary Classification of Protein Domains [J].
Cheng, Hua ;
Schaeffer, R. Dustin ;
Liao, Yuxing ;
Kinch, Lisa N. ;
Pei, Jimin ;
Shi, Shuoyong ;
Kim, Bong-Hyun ;
Grishin, Nick V. .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (12)