AUTOMATIC LEARNING OF CHEMICAL CONCEPTS - RESEARCH OCTANE NUMBER AND MOLECULAR SUBSTRUCTURES

被引:17
作者
BLUROCK, ES
机构
[1] Research Institute for Symbolic Computation, Johannes Kepler University
来源
COMPUTERS & CHEMISTRY | 1995年 / 19卷 / 02期
关键词
D O I
10.1016/0097-8485(95)00001-9
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A set of 230 hydrocarbons is analyzed with respect to the research octane number (RON) using an extended version of the ID3 machine learning method of Quinlan. The basic ID3 method produces a decision tree. The questions within the decision tree ask whether given substructures are present or absent within a molecule (for example, is the hexane carbon skeleton present as a substructure?). The decision to be made is whether the RON value is within a given range (for example, is the RON greater than or equal to 90). In addition to the normal usage of decision trees to predict the RON value, qualitative information is extracted: for example, which substructures are significant in which RON ranges. The best results were found for those molecules having RON values between 90 and 105. This was due to the high density of samples in this region. The ID3 method is used to prune down an overabundance of parameters to a reasonable size. From the original set of 230 substructures, it was found that only 31 were needed for the RON description. The presence or absence of the substructures n-hexane, methylbenzene and benzene were particularly significant in determining the RON value. Considering that ID3 is not a traditional method of analysis, especially within chemistry, an effort is made to explain its principles using the problem at hand as an example. In addition, its use as the first step in analysis to gain intuitive information is explained.
引用
收藏
页码:91 / 99
页数:9
相关论文
共 10 条
[1]  
ASTM, Standard Test Method for knock characteristics of motor and aviation fuels by the research method, Technical Report, (1988)
[2]  
Blurock, Analysis 3.0: implementation, extensions and use of the ID3 algorithm, Technical Report, (1992)
[3]  
Blurock, Technical Report, Analysis 3.0: reference manual, (1992)
[4]  
Degen, Berechnung der Oktanzahlen von Alkanen mit Hilfe topologischer Indizes nach Randic, Software-Entwicklung in der Chemie 3, (1989)
[5]  
Dietterich, Ann. Rev. Comput. Sci., (1990)
[6]  
Pal, Purkayastha, Sengupta, Ind. J. Chem.: Section B—Org. Chem. Med. Chem., 31, pp. 109-114, (1992)
[7]  
Quinlan, Mach. Learn., 1, pp. 81-106, (1986)
[8]  
Quinlan, IEEE Trans. Syst. Man. Cybernet., 20, pp. 339-346, (1990)
[9]  
Randic, Nonempirical approach to structure-activity studies, International Journal of Quantum Chemistry, 11, (1984)
[10]  
Randic, New J. Chem., 15, pp. 417-525, (1991)