Apriori algorithm for sub-category classification analysis of handwriting

被引:30
作者
Cha, SH [1 ]
Srihari, SN [1 ]
机构
[1] Pace Univ, Sch Comp Sci & Informat Syst, Pleasantville, NY 10570 USA
来源
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDAR.2001.953940
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sub-category classification problem is that of discriminating a pattern to all sub-categories. Not surprisingly, sub-category classification performance estimates are useful information to mine as many researchers are interested in any trend of pattern in specific sub-category. This paper presents a datamining technique to mine a database consisting of experimental and observational unit variables. Experimental unit variables are those attributes which make sub-categories of the entity, e.g., demographic data and observational unit variables are features observed to classify the entity, e.g., test results or handwriting styles, etc. Since there are an enormously large number of subcategories based on the experimental unit variables, we apply the Apriori algorithm to select only sub-categories that have enough support among all possible ones in a given database. Those selected sub-categories are then discriminated using observational unit variables as input features to the Artificial Neural Network (ANN) classifier. The importance of this paper is twofold. First, we propose an algorithm: that quickly selects all sub-categories that have enough both support and classification rate. Second, we successfully applied the proposed algorithm to the field of handwriting analysis. The task is to determine similarity of handwriting style of a specific group of people. Document examiners are interested in trends in the handwriting of specific groups, e.g., (i) does a male write differently from a female? (ii) can we tell the difference in handwriting of age group between 25 and 45 from others?, etc. Subgroups of white males in the age group 15-24 and white females in the age group 45-64 show 87% correct classification performance.
引用
收藏
页码:1022 / 1025
页数:2
相关论文
共 11 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
AGRAWAL R, 1994, P 20 INT C VER LARG, V2, P478
[3]  
[Anonymous], 2000, 7 INT WORKSH FRONT H
[4]  
[Anonymous], ADV KNOWLEDGE DISCOV
[5]  
Cha SH, 2001, P SOC PHOTO-OPT INS, V4307, P13
[6]  
Cha SH, 2000, LECT NOTES COMPUT SC, V1876, P123
[7]  
Cherkassky V., 1994, From Statistics to Neural Networks
[8]  
CHOI SC, 1986, STAT METHODS DISCRIM
[9]  
Duda R. O., 2000, Pattern Classification and Scene Analysis, V2nd
[10]  
HUBER RA, 1999, HANDWITING IDENTIFIC