首页 | 本学科首页   官方微博 | 高级检索  
     

基于互信息的语言模型回退平滑算法
引用本文:张磊,褚昆,郭黎利. 基于互信息的语言模型回退平滑算法[J]. 应用科技, 2009, 36(4): 28-31
作者姓名:张磊  褚昆  郭黎利
作者单位:哈尔滨工程大学,信息与通信工程学院,黑龙江,哈尔滨,150001
摘    要:针对二元模型,提出了一种基于互信息的回退(MI Back-off)平滑算法.从互信息的角度,分析词之间的搭配关系,根据模型中每个二元对的互信息对其概率进行不同程度的折扣,并利用低阶模型对零概率事件进行补偿,通过极小化困惑度的原则体现新算法的合理性.在不同类别测试集下,该平滑算法与传统Katz平滑算法相比,模型困惑度下降均超过20%。

关 键 词:中文信息处理  统计语言模型  平滑算法  互信息  困惑度

A back-off smoothing algorithm of language model based on mutual information
ZHANG Lei,CHU Kun,GUO Li-li. A back-off smoothing algorithm of language model based on mutual information[J]. Applied Science and Technology, 2009, 36(4): 28-31
Authors:ZHANG Lei  CHU Kun  GUO Li-li
Affiliation:(College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China)
Abstract:A back-off smoothing algorithm based on mutual information for bigram model was presented. This algorithm not only analyzes the coupling relations between words from the perspective of mutual information, discounts the probabilities differently according to the mutual information of each bigram in the model, but also takes advantage of the low-order model to compensate for zero-probability case. Based on a very small degree of confusion prin- ciple, this algorithm was proved to be reasonable. For unseen events, the probabilities were back off to low-order model. Furthermore, the model parameters were estimated by minimizing the perplexity. In testing corpus of different domains, all the perplexities of the proposed smoothing algorithm decline more than 20% compared with the traditional Katz algorithm.
Keywords:information processing of Chinese characters  statistical language model  smoothing algorithm  mutual information  perplexity
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号