首页 | 本学科首页   官方微博 | 高级检索  
     

基于字位置概率特征的条件随机场中文分词方法
引用本文:沈勤中,周国栋,朱巧明,孔芳,丁金涛. 基于字位置概率特征的条件随机场中文分词方法[J]. 苏州大学学报(医学版), 2008, 24(3): 49-54
作者姓名:沈勤中  周国栋  朱巧明  孔芳  丁金涛
作者单位:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]江苏省计算机信息处理技术重点实验室,江苏苏州215006
基金项目:国家高技术研究发展计划(863计划),国家自然科学基金 
摘    要:将分词看成是一个对汉字进行分类的过程,然后利用条件随机场(CRFs)模型对每个汉字进行标记,最后转换为相应的分词结果.在现有CRFs模型的基础上,从字的构词能力角度出发,探索了字位置概率特征,提出了基于字位置概率特征的条件随机场中文分词方法.实验表明,字位置概率特征的引入,使得结果F1值提高了3.5%,达到94.5%.

关 键 词:中文分词  条件随机场  字位置概率特征

CRFs-based Chinese word segmentation method with character position probability feature
Shen Qinzhong,Zhou Guodong,Zhu Qiaoming,Kong Fang,Ding Jintao. CRFs-based Chinese word segmentation method with character position probability feature[J]. Journal of Suzhou University(Natural Science), 2008, 24(3): 49-54
Authors:Shen Qinzhong  Zhou Guodong  Zhu Qiaoming  Kong Fang  Ding Jintao
Affiliation:Shen Qinzhong,Zhou Guodong,Zhu Qiaoming, Kong Fang, Ding Jintao ( School of Computer Science and Technology, Suzhou Univ. , Suzhou 215006, China ; Jiangsu Provincial Key LaboratotT of Computer Information Processing Technology, Suzhou 215006, China)
Abstract:The task of word segmentation is converted into a classification problem, in which conditional random fields (CRFs) are used to tag each character. Finally, according to the tags, the segmentation results are produced. Based on the traditional features used in CRFs models in this literature, it proposes a novel feature--character posi- tion probability feature--according to the ability of making up words. The experiments show that the new feature sig- nificantly improve the F value up to 94.5% with an increase of 3.5%.
Keywords:Chinese word segmentation  conditional random fields  character position probability feature
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号