Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research

被引:586
作者
Fourches, Denis [1 ]
Muratov, Eugene [1 ,2 ]
Tropsha, Alexander [1 ]
机构
[1] Univ N Carolina, Eshelman Sch Pharm, Lab Mol Modeling, Chapel Hill, NC 27599 USA
[2] AV Bogatsky Phys Chem Inst NAS Ukraine, Lab Theoret Chem, Dept Mol Struct, UA-65080 Odessa, Ukraine
关键词
INDUCED LIVER-INJURY; TETRAHYMENA-PYRIFORMIS; QUANTITATIVE STRUCTURE; TOXICITY; NITROAROMATICS; CONCORDANCE; SOLUBILITY; INHIBITORS; CHALLENGE; MOLECULES;
D O I
10.1021/ci100176x
中图分类号
R914 [药物化学];
学科分类号
100705 [微生物与生化药学];
摘要
Chemical structure curatione plays an important role in cheminformatics and QSAR modeling research. Both common sense and the recent investigations described above indicate that chemical record curation should be viewed as a separate and critical component of any cheminformatics research. Treatment of mixtures is not as simple as it appears. The practice of retaining the component with the highest molecular weight or largest number of atoms is common and widely used, but not necessarily the best solution. Manual conversion of all functional groups to some standard forms is too time-consuming and could introduce additional human-dependent nonsystematic errors. ChemAxon's Standardizer is probably the most well-known tool to rapidly and efficiently realize chemotype normalizations. Rigorous statistical analysis of any data set assumes that each compound is unique and thus, structurally different from all other compounds.
引用
收藏
页码:1189 / 1204
页数:16
相关论文
共 68 条
[1]
*ACC, ACC
[2]
[Anonymous], JCHEM
[3]
[Anonymous], CANV
[4]
[Anonymous], MOE MOL OP ENV
[5]
ARTEMENKO AG, 2009, J CHEMINFORMAT UNPUB
[6]
NIH Molecular Libraries Initiative [J].
Austin, CP ;
Brady, LS ;
Insel, TR ;
Collins, FS .
SCIENCE, 2004, 306 (5699) :1138-1139
[7]
Name=Struct: A practical approach to the sorry state of real-life chemical nomenclature [J].
Brecher, J .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (06) :943-950
[8]
BRECHER J, 2002, CAS IUPAC C CHEM ID
[9]
*CADASTER, ENV TOX PRED CHALL
[10]
*CAMBRIDGESOFT, CHEMOFFICE