首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于聚簇隐马尔可夫模型的文本信息抽取
引用本文:周顺先,林亚平,王耀南,易叶青.基于聚簇隐马尔可夫模型的文本信息抽取[J].系统仿真学报,2007,19(21):4926-4931.
作者姓名:周顺先  林亚平  王耀南  易叶青
作者单位:1. 湖南大学计算机与通信学院,湖南,长沙,410082;湖南大学电气与信息工程学院,湖南,长沙,410082
2. 湖南大学计算机与通信学院,湖南,长沙,410082
3. 湖南大学电气与信息工程学院,湖南,长沙,410082
基金项目:湖南省重点自然科学基金;湖南省教育厅青年基金
摘    要:应用隐马尔可夫模型是文本信息抽取的一种重要方法。对于网上不同来源的文本,由于其格式很不相同,进行混合训练,一般难以得到较优化的模型。将聚簇应用到文本信息抽取中,首先通过一种改进的k-平均方法对训练文本的Markov链模型进行聚簇,然后训练各簇的隐马尔可夫模型,提出了一种基于聚簇隐马尔可夫模型的文本信息抽取算法(C-HMM)。对700篇网上不同来源的文本进行信息抽取仿真实验,结果表明,新的算法能有效地提高抽取性能。

关 键 词:聚簇  马尔可夫链  隐马尔可夫模型  信息抽取
文章编号:1004-731X(2007)21-4926-06
收稿时间:2006-12-26
修稿时间:2007-02-27

Text Information Extraction Based on Clustering Hidden Markov Model
ZHOU Shun-xian,LIN Ya-ping,WANG Yao-nan,YI Ye-qing.Text Information Extraction Based on Clustering Hidden Markov Model[J].Journal of System Simulation,2007,19(21):4926-4931.
Authors:ZHOU Shun-xian  LIN Ya-ping  WANG Yao-nan  YI Ye-qing
Institution:1.College of Computer and Communication, Hunan Univ., Changsha 410082, China; 2.College of Electrical and Information Engineering, Hunan Univ., Changsha 410082, China
Abstract:Using Hidden Markov model is an important approach for text information extraction.The form is dissimilar for texts which are from different resource of network.The optimal model is commonly difficult to obtain by hybrid training texts.Clustering was applied to text information extraction.Clustering was given to Markov Chains of training texts through an improved approach of K-mean,and Hidden Markov model was trained out through every cluster.An algorithm of text information extraction based on clustering hidden Markov model(C-HMM) was proposed.A simulation experiment of information extraction was tried on 700 texts from different resource of network.Results show that the performance of extraction can be improved effectively by using the new algorithm.
Keywords:clustering  markov chains  hidden markov model  information extraction
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号