Identifying biologically active compound classes using phenotypic screening data and sampling statistics

被引:12
作者
Klekota, J
Brauner, E
Schreiber, SL
机构
[1] Harvard Univ, Howard Hughes Med Inst, Inst Chem & Cell Biol, Broad Inst, Cambridge, MA 02138 USA
[2] Harvard Univ, MIT, Cambridge, MA 02138 USA
关键词
D O I
10.1021/ci050087d
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Scoring the activity of compounds in phenotypic high-throughput assays presents a unique challenge because of the limited resolution and inherent measurement error of these assays. Techniques that leverage the structural similarity of compounds within an assay can be used to improve the hit-recovery rate from screening data. A technique is presented that uses clustering and sampling statistics to predict likely compound activity by scoring entire structural classes. A set of phenotypic assays performed against a commercially available compound library was used as a test set. Using the class-scoring technique, the resultant activity prediction scores were more reproducible than individual assay measurements, and class scoring recovered known active compounds more efficiently than individual assay measurements because class scoring had fewer false positives. Known biologically active compounds were recovered 87% of the time using class scores, suggesting a low false-negative rate that compared well to individual assay measurements. In addition, many weak and potentially novel classes of active compounds, overlooked by individual assay measurements, were suggested.
引用
收藏
页码:1824 / 1836
页数:13
相关论文
共 94 条
[1]   Combinatorial informatics in the post-genomics era [J].
Agrafiotis, DK ;
Lobanov, VS ;
Salemme, FR .
NATURE REVIEWS DRUG DISCOVERY, 2002, 1 (05) :337-346
[2]   Advances in diversity profiling and combinatorial series design [J].
Agrafiotis, DK ;
Myslik, JC ;
Salemme, FR .
MOLECULAR DIVERSITY, 1998, 4 (01) :1-22
[3]  
Agresti A., 1990, Analysis of categorical data
[4]   Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening [J].
Bajorath, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02) :233-245
[6]   Molecular diversity and representativity in chemical databases [J].
Bayada, DM ;
Hamersma, H ;
van Geerestein, VJ .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01) :1-10
[7]   Comparative structural connectivity spectra analysis (CoSCoSA) models of steroid binding to the corticosteroid binding globulin [J].
Beger, RD ;
Buzatu, DA ;
Wilkes, JG ;
Lay, JO .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2002, 42 (05) :1123-1131
[8]   Characterizing gene sets with FuncAssociate [J].
Berriz, GF ;
King, OD ;
Bryant, B ;
Sander, C ;
Roth, FP .
BIOINFORMATICS, 2003, 19 (18) :2502-2504
[9]   THE AREA BETWEEN CURVES (ABC) - MEASURE IN NUTRITIONAL ANTHROPOMETRY [J].
BOHNING, D ;
HEMPFLING, A ;
SCHELP, FP ;
SCHLATTMANN, P .
STATISTICS IN MEDICINE, 1992, 11 (10) :1289-1304
[10]   The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding [J].
Brown, RD ;
Martin, YC .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (01) :1-9