首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种高效的用于话题检测的关键词元聚类方法
引用本文:杨攀,桂小林,田丰,王刚.一种高效的用于话题检测的关键词元聚类方法[J].西安交通大学学报,2012,46(10):24-28.
作者姓名:杨攀  桂小林  田丰  王刚
作者单位:1. 西安交通大学电子与信息工程学院,710049,西安;陕西省计算机网络重点实验室,710049,西安
2. 西安交通大学电子与信息工程学院,710049,西安;西安财经学院信息学院,710100,西安
基金项目:国家自然科学基金资助项目(61172090);国家科技重大专项课题(2012ZX03002001-004)
摘    要:针对基于关键词元的话题内事件检测算法运行效率不高、不适合进行大规模文本话题检测的问题,提出了一种高效的关键词元聚类算法.该算法在进行词元簇选择时,为簇间相似度分配权值,并借鉴正态分布函数评估词元簇的个数,提高词元簇的选择精度,从而减少所需的词元聚类次数.实验结果表明,将改进的方法应用到舆情监控的话题检测中,能在不影响检测精度的前提下有效地提高算法的运行效率.

关 键 词:话题检测  关键词元  舆情监控

Efficient Key words Clustering Method for Topic Detection
YANG Pan , GUI Xiaolin , TIAN Feng , WANG Gang.Efficient Key words Clustering Method for Topic Detection[J].Journal of Xi'an Jiaotong University,2012,46(10):24-28.
Authors:YANG Pan  GUI Xiaolin  TIAN Feng  WANG Gang
Institution:1,3(1.School of Electronics and Information Engineering,Xi’an Jiaotong University,Xi’an 710049,China; 2.Shaanxi Province Key Laboratory of Computer Network,Xi’an 710049,China; 3.School of Information,Xi’an University of Finance and Economics,Xi’an 710100,China)
Abstract:An improved term-committee-based event identification algorithm is presented to meet the requirements of efficiency and accuracy in public opinion monitor system,where the original event identification algorithm can not be applied due to its lower efficiency.While the similarity between the clusters is calculated,the weight is taken into consideration simultaneously.Referencing the examples from normal curve,an evaluation algorithm is proposed to help choosing cluster with a proper term number,thus the improved algorithm only needs clustering once.The experiments indicate the operating efficiency for the required accuracy.
Keywords:topic detection  term-committee  public opinion monitor
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号