一种基于概率模型的分词系统 Segmenting Chinese Based on Probabilistic Model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种基于概率模型的分词系统

引用本文：	李家福,张亚非. 一种基于概率模型的分词系统[J]. 系统仿真学报, 2002, 14(5): 544-546,550

作者姓名：	李家福张亚非

作者单位：	1. 解放军理工大学通信工程学院,南京,210016 2. 解放军理工大学理学院,南京,210016

基金项目：	国家自然科学基金项目(编号: 69975024)，国家自然科学基金重点项目(编号: 69931040)

摘要：	汉语自动分词是中文信息处理中的基础课题。本文首先对汉语分词的基本概念与应用，以及汉语分词的基本方法进行了概述。接着引出一种根据词的出现概率、基于极大似然原则构建的汉语自动分词的零阶马尔可夫模型，并重点剖析了EM（Expectation-Maximization）算法，最后给出了一个基于本模型的汉语文本处理仿真系统。
关键词：	概率模型分词系统 EM算法语料库系统仿真汉语自动分词中文信息处理
文章编号：	1004-731X(2002)05-0544-03
Segmenting Chinese Based on Probabilistic Model

LI Jia-fu,ZHANG Ya-fei. Segmenting Chinese Based on Probabilistic Model[J]. Journal of System Simulation, 2002, 14(5): 544-546,550

Authors:	LI Jia-fu ZHANG Ya-fei

Affiliation:	LI Jia-fu1,ZHANG Ya-fei2

Abstract:	Word Segmentation is a basic task of Chinese Information Processing. In this paper we present a simple probabilistic model of Chinese text based on the occurrence probability of the words, which can be seen as a zero-th order hidden Markov Model (HMM). Then we investigate how to discover by EM algorithm the words and their probabilities from a corpus of unsegmented text without using a dictionary. The last part presents a simulation system of processing Chinese text.

Keywords:	word segmentation EM algorithm corpus HMM system simulation
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏