首页 | 本学科首页   官方微博 | 高级检索  
     

基于语料库的语言建模
引用本文:许伟,苑春法,黄昌宁. 基于语料库的语言建模[J]. 清华大学学报(自然科学版), 1997, 0(3)
作者姓名:许伟  苑春法  黄昌宁
作者单位:清华大学计算机科学与技术系,智能技术与系统国家重点实验室
摘    要:语料库语言学的发展的核心问题是语言模型的建立问题。常用的语言模型可以概括为三类:(1)n元模型(及隐马尔可夫模型);(2)基于分布理论的模型;(3)基于规则的模型。基于语料库的建模过程就是对语言模型的参数进行求解的过程,也可以认为是一个机器学习的过程。它可分为两大类别:(1)有指导学习;(2)无指导学习。本文着重论述了近年发展的热点——无指导学习的各种技术和影响参数可信度的数据稀疏问题及其解决办法。

关 键 词:语言模型;参数求解;数据稀疏;有指导学习;无指导学习

Language modeling based on corpus
Xu Wei,Yuan Chunfa,Huang Changning. Language modeling based on corpus[J]. Journal of Tsinghua University(Science and Technology), 1997, 0(3)
Authors:Xu Wei  Yuan Chunfa  Huang Changning
Affiliation:Xu Wei,Yuan Chunfa,Huang Changning Department of Computer Science and Technology,Tsinghua University, State Key Laboratory of Intelligent Technology and Systems,Beijing 100084
Abstract:The central problem in corpus linguistics is language modeling. The three major types of language model are: a) The n gram model and the HMM model; b) The distribution based model; c) The rule based model. The procedure of corpus based language modeling is mainly to estimate the parameters of that model. The parameters of the language model can be estimated through the supervised learning or unsupervised learning. The latter is becoming the highlight of research because it needs only the raw corpus and very little human a priori knowledge. Some techniques of unsupervised parameter estimating have been elaborated. Also, the major cause of the incredibility of estimated parameters sparse data problem have been elaborated .
Keywords:language model  parameter estimating  data sparseness  supervised learning  unsupervised learning  
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号