首页 | 本学科首页   官方微博 | 高级检索  
     


Chinese Word Boundary Ambiguity and Unknown Word Resolution Using Unsupervised Methods
Authors:Fu Guohong  Wang Xiaolong  Jiang Shouxu
Abstract:An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described.It first proposes a statistical segmentation model integrating the simplified character juncture model (SCJM) with word formation power.The advantage of this model is that it can employ the affinity of characters inside or outside a word and word formation power simultaneously to process disambiguation and all the parameters can be estimated in an unsupervised way.After investigating the differences between real and theoretical size of segmentation space, we apply A* algorithm to perform segmentation without exhaustively searching all the potential segmentations.Finally, an unsupervised version of Chinese word-formation patterns to detect unknown words is presented.Experiments show that the proposed methods are efficient.
Keywords:Word segmentation  Character Juncture  Work-formation pattern
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号