Ranking Chemical Structures for Drug Discovery: A New Machine Learning Approach

被引:87
作者
Agarwal, Shivani [1 ]
Dugar, Deepak [2 ]
Sengupta, Shiladitya [3 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[3] Harvard Univ, Brigham & Womens Hosp, Sch Med, Dept Med, Boston, MA 02115 USA
基金
美国国家科学基金会;
关键词
SUPPORT VECTOR MACHINES; GENERALIZATION BOUNDS; TUTORIAL; KERNEL; AREA;
D O I
10.1021/ci9003865
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
With chemical libraries increasingly containing millions of compounds or more, there is a fast-growing need for computational methods that can rank or prioritize compounds for screening. Machine learning methods have shown considerable promise for this task; indeed, classification methods such as support vector machines (SVMs), together with their variants, have been used in virtual screening to distinguish active compounds from inactive ones, while regression methods such as partial least-squares (PLS) and support vector regression (SVR) have been used in quantitative structure activity relationship (QSAR) analysis for predicting biological activities of compounds. Recently, a new class of machine learning methods namely, ranking methods, which are designed to directly optimize ranking performance have been developed for ranking tasks such as web search that arise in information retrieval (IR) and other applications. Here we report the application of these new ranking methods in machine learning to the task of ranking chemical structures. Our experiments show that the new ranking methods give better ranking performance than both classification based methods in virtual screening and regression methods in QSAR analysis. We also make some interesting connections between ranking performance measures used in cheminformatics and those used in IR studies.
引用
收藏
页码:716 / 731
页数:16
相关论文
共 55 条
  • [1] Agarwal S, 2005, J MACH LEARN RES, V6, P393
  • [2] Agarwal S., 2006, P 23 INT C MACH LEAR, P25, DOI [DOI 10.1145/1143844.1143848, 10.1145/1143844.1143848]
  • [3] Agarwal S., 2009, P 8 ANN INT C COMP S
  • [4] Agarwal S, 2009, J MACH LEARN RES, V10, P441
  • [5] [Anonymous], ADV NEURAL INFORM PR
  • [6] [Anonymous], 1999, Athena scientific Belmont
  • [7] [Anonymous], 2003, Journal of machine learning research
  • [8] [Anonymous], 2007, P 30 ANN INT ACM SIG, DOI [DOI 10.1145/1277741.1277809, 10.1145/1277741.1277809]
  • [9] [Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
  • [10] [Anonymous], 2008, International Conference on Machine Learning (ICML)