用数据采掘方法获取汉语词性标注规则

被引:11
作者
李晓黎
史忠植
不详
机构
[1] 不详
[2] 中国科学院计算技术研究所!北京
[3] 不详
[4] 不详
关键词
词性; 语料库标注; 数据采掘; 关联规则;
D O I
暂无
中图分类号
学科分类号
摘要
从数据采掘的角度对汉语文本词性标注规则的获取进行研究 .在满足用户规定的支持度向量的前提下 ,先从候选集模式中挑选出常用模式 ;然后采掘出具有高可信度的产生式规则 .该过程完全是自动的 ,而获取的规则在表达上是明确的 ,同时又是隐含在数据中的、用户不易发现的 .实验表明 :在原有统计方法的基础上 ,利用自动获得的标注规则作为补充 ,可以提高词性标注的正确率 .
引用
收藏
页码:1409 / 1414
页数:6
相关论文
共 9 条
[1]  
Combining forecasts from multiple textual data sources.In: Proc of 3rd Pacific-Asia Conf of PAKDD99. Vincent Cho,Beat Wuthrich. Beijing Review . 1999
[2]  
An algorithm for constrained association rule mining in semi -structured data.In: Proc of 3rd Pacific-Asia Conf of PAKDD99. Lisa Singh,Bin Chen,Rebecca Haight et al. Beijing Review . 1999
[3]  
Automatic part -of -speech tagging for Chinese corpus. Liu S,Chen K,Chang L et al. Computer progressing of Chinese and Oriental Languages . 1995
[4]  
Computational analysis of English:A corpus based approach. Garside R,L eech G,Sampson G. . 1987
[5]  
Text Mining Technology: Turning Information Into Knowledge. Daniel Tkach. . 1998
[6]  
An effective hash-based algorithm for mining association rules. Park JS,Chen MS,Yu PS. Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data . 1995
[7]  
A maximum likelihood approach to continuous speech recognition. Bahl LR,Jelinek F,Mwecer RL. IEEE Transactions on Pattern Analysis and Machine Intelligence . 1983
[8]  
Fast algorithms for Mining Association Rules. Agrawal R,Srikant R. Proc.20th Intl Conf.Very Large Databases . 1994
[9]  
CL AWS4:The tagging of the British national corpus. L eech G,,Garside R,Bryant M. Proc of 15 th Int’’l Conf on ComputationalL inguistics . 1994