基于语料库的语言建模 Language modeling based on corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于语料库的语言建模

引用本文：	许伟,苑春法,黄昌宁.基于语料库的语言建模[J].清华大学学报(自然科学版),1997(3).

作者姓名：	许伟苑春法黄昌宁

作者单位：	清华大学计算机科学与技术系，智能技术与系统国家重点实验室

摘要：	语料库语言学的发展的核心问题是语言模型的建立问题。常用的语言模型可以概括为三类：（１）ｎ元模型（及隐马尔可夫模型）；（２）基于分布理论的模型；（３）基于规则的模型。基于语料库的建模过程就是对语言模型的参数进行求解的过程，也可以认为是一个机器学习的过程。它可分为两大类别：（１）有指导学习；（２）无指导学习。本文着重论述了近年发展的热点——无指导学习的各种技术和影响参数可信度的数据稀疏问题及其解决办法。
关键词：	语言模型参数求解数据稀疏有指导学习无指导学习
Language modeling based on corpus

Xu Wei,Yuan Chunfa,Huang Changning.Language modeling based on corpus[J].Journal of Tsinghua University(Science and Technology),1997(3).

Authors:	Xu Wei Yuan Chunfa Huang Changning

Institution:	Xu Wei,Yuan Chunfa,Huang Changning Department of Computer Science and Technology,Tsinghua University, State Key Laboratory of Intelligent Technology and Systems,Beijing 100084

Abstract:	The central problem in corpus linguistics is language modeling. The three major types of language model are: a) The n gram model and the HMM model; b) The distribution based model; c) The rule based model. The procedure of corpus based language modeling is mainly to estimate the parameters of that model. The parameters of the language model can be estimated through the supervised learning or unsupervised learning. The latter is becoming the highlight of research because it needs only the raw corpus and very little human a priori knowledge. Some techniques of unsupervised parameter estimating have been elaborated. Also, the major cause of the incredibility of estimated parameters sparse data problem have been elaborated .

Keywords:	language model parameter estimating data sparseness supervised learning unsupervised learning
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏