用数据采掘方法获取汉语词性标注规则

被引：11

作者：

李晓黎

史忠植

不详

机构：

[1] 不详

[2] 中国科学院计算技术研究所!北京

[3] 不详

[4] 不详

来源：

计算机研究与发展 | 2000年 / 12期

关键词：

词性; 语料库标注; 数据采掘; 关联规则;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

从数据采掘的角度对汉语文本词性标注规则的获取进行研究 .在满足用户规定的支持度向量的前提下 ,先从候选集模式中挑选出常用模式 ;然后采掘出具有高可信度的产生式规则 .该过程完全是自动的 ,而获取的规则在表达上是明确的 ,同时又是隐含在数据中的、用户不易发现的 .实验表明 :在原有统计方法的基础上 ,利用自动获得的标注规则作为补充 ,可以提高词性标注的正确率 .

引用

页码：1409 / 1414

页数：6

共 9 条

[1]

Combining forecasts from multiple textual data sources.In: Proc of 3rd Pacific-Asia Conf of PAKDD99. Vincent Cho,Beat Wuthrich. Beijing Review . 1999

[2]

An algorithm for constrained association rule mining in semi -structured data.In: Proc of 3rd Pacific-Asia Conf of PAKDD99. Lisa Singh,Bin Chen,Rebecca Haight et al. Beijing Review . 1999

[3]

Automatic part -of -speech tagging for Chinese corpus. Liu S,Chen K,Chang L et al. Computer progressing of Chinese and Oriental Languages . 1995

[4]

Computational analysis of English:A corpus based approach. Garside R,L eech G,Sampson G. . 1987

[5]

Text Mining Technology: Turning Information Into Knowledge. Daniel Tkach. . 1998

[6]

An effective hash-based algorithm for mining association rules. Park JS,Chen MS,Yu PS. Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data . 1995

[7]

A maximum likelihood approach to continuous speech recognition. Bahl LR,Jelinek F,Mwecer RL. IEEE Transactions on Pattern Analysis and Machine Intelligence . 1983

[8]

Fast algorithms for Mining Association Rules. Agrawal R,Srikant R. Proc.20th Intl Conf.Very Large Databases . 1994

[9]

CL AWS4:The tagging of the British national corpus. L eech G,,Garside R,Bryant M. Proc of 15 th Int’’l Conf on ComputationalL inguistics . 1994

← 1 →