Incorporating Linguistic Rules in Statistical Chinese Language Model for Pinyin-to-character Conversion期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Incorporating Linguistic Rules in Statistical Chinese Language Model for Pinyin-to-character Conversion

Authors:	Liu Bingquan Wang Xiaolong Wang Yuying

Affiliation:	Department of Computer Science and Engineering, Harbin Institute of Technology,;Department of Computer Science and Engineering, Harbin Institute of Technology,;Department of Computer Science and Engineering, Harbin Institute of Technology,

Abstract:	An N-gram Chinese language model incorporating linguistic rules is presented. By constructing elements lattice, rules information is incorporated in statistical frame. To facilitate the hybrid modeling, novel methods such as MI-based rule evaluating, weighted rule quantification and element-based n-gram probability approximation are presented. Dynamic Viterbi algorithm is adopted to search the best path in lattice. To strengthen the model, transformation-based error-driven rules learning is adopted. Applying proposed model to Chinese Pinyin-to-character conversion, high performance has been achieved in accuracy, flexibility and robustness simultaneously. Tests show correct rate achieves 94.81% instead of 90.53% using bi-gram Markov model alone. Many long-distance dependency and recursion in language can be processed effectively.

Keywords:
本文献已被万方数据等数据库收录！