首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于最大熵模型的词位标注汉语分词
引用本文:于江德,王希杰,樊孝忠.基于最大熵模型的词位标注汉语分词[J].郑州大学学报(自然科学版),2011(1):70-74.
作者姓名:于江德  王希杰  樊孝忠
作者单位:[1]安阳师范学院计算机与信息工程学院,河南安阳455002 [2]北京理工大学计算机科学技术学院,北京100081
基金项目:高等学校博士点项目 编号20050007023; 河南省高等学校青年骨干教师项目 编号2009GGJS-108
摘    要:近年来基于字的词位标注汉语分词方法极大地提高了分词的性能,该方法将汉语分词转化为字的词位标注问题,借助于优秀的序列标注模型,词位标注汉语分词逐渐成为汉语分词的主要技术路线.该方法中特征模板集设定和词位标注集的选择至关重要,采用不同的词位标注集,使用最大熵模型进一步研究了词位标注汉语分词技术.在国际汉语分词评测Bakeoff2005的语料上进行了封闭测试,并对比了不同词位标注集对分词性能的影响.实验表明所采用的六词位标注集配合相应的特征模板集TMPT-6较其他词位标注集分词性能要好.

关 键 词:汉语分词  词位标注  最大熵模型  词位标注集  特征模板

Chinese Word Segmentation via Word-position Tagging Based on Maximum Entropy Model
YU Jiang-de,WANG Xi-jie,FAN Xiao-zhong.Chinese Word Segmentation via Word-position Tagging Based on Maximum Entropy Model[J].Journal of Zhengzhou University (Natural Science),2011(1):70-74.
Authors:YU Jiang-de  WANG Xi-jie  FAN Xiao-zhong
Institution:1.School of Computer and Information Engineering,Anyang Normal University,Anyang 455002,China;2.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
Abstract:The performance of Chinese word segmentation has been greatly improved by word-position-based approaches in recent years.This approach treated Chinese word segmentation as a word-position tagging.With the help of powerful sequence tagging model,word-position-based method quickly rose as a mainstream technique in this field.Feature template selection and tag sets selection was crucial in this method.The technique was studied via using different word-positions tag sets and maximum entropy model.Closed evaluations were performed on corpus from the second international Chinese word segmentation Bakeoff-2005,and comparative experiments were performed on different tag sets and feature templates.Experimental results showed that the feature template set TMPT-6 and six word-position tag sets was much better than the other.
Keywords:Chinese word segmentation  word-position tagging  maximum entropy model  word-position tag sets  feature template
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号