首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于组合型中文分词技术的改进
引用本文:梁胜,成卫青.基于组合型中文分词技术的改进[J].南京邮电大学学报(自然科学版),2013(6):112-117.
作者姓名:梁胜  成卫青
作者单位:南京邮电大学计算机学院,江苏南京210023
基金项目:国家自然科学基金(61170322,71171117)和江苏省自然科学基金(BK2010524)资助项目
摘    要:在分词过程中如何处理歧义切分是中文分词算法要解决的难点之一.文中提出了一种改进的基于词典和基于统计的组合中文分词算法,能够发现并处理交集型歧义,该算法在发现歧义的过程中没有采取传统的双向匹配法而是采用双栈的结构,减少了匹配花费时间,并分别采取长词优先和最大概率方法处理一般的交集型歧义和特殊的同词长交集型歧义.最后通过实例对文中所提出的算法进行实验验证,结果表明该算法比传统的分词算法有更好的准确率.

关 键 词:中文信息处理  组合型分词  交集型歧义

Improvement of Chinese Word Segmentation Based on Combination Method
LIANG Sheng,CHENG Wei-qing.Improvement of Chinese Word Segmentation Based on Combination Method[J].Journal of Nanjing University of Posts and Telecommunications,2013(6):112-117.
Authors:LIANG Sheng  CHENG Wei-qing
Institution:1.School of Computer Science & Technology, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;)
Abstract:How to deal with ambiguity in the segmentation process is a challenging issue that requires Chinese word segmentation algorithms to solve it.This paper proposes an improved dictionary and statisticsbased Chinese word segmentation combination algorithm that can discovery and solve the crossing ambiguity.This algorithm adopts dual stack structure rather than traditional bidirectional matching method to discover ambiguity with less matching time.Furthermore,the algorithm takes methods "choosing longer word" and "choosing word with maximum probability" respectively to deal with general crossing ambiguity and special crossing ambiguity with equal length.Finally,it was verified by case studies that the proposed algorithm has better accuracy than traditional word segmentation algorithms.
Keywords:Chinese information processing  combination-type segmentation  crossing ambiguity
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号