首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进Single-Pass算法的网络新闻话题发现
引用本文:孙红光,高星,孙铁利,杨凤芹,彭杨,冯国忠.基于改进Single-Pass算法的网络新闻话题发现[J].吉林大学学报(理学版),2018,56(1):114-118.
作者姓名:孙红光  高星  孙铁利  杨凤芹  彭杨  冯国忠
作者单位:1. 东北师范大学 信息科学与技术学院, 长春 130117;  2. 智能信息处理吉林省高校重点实验室, 长春 130117; 3. 解放军报社, 北京 100832
摘    要:通过改进的Single Pass增量文本聚类算法, 以话题为粒度对新闻信息进行组织, 实现网络新闻话题的发现. 该方法考虑了新闻的动态性和时间特性, 在特征词项权重计算中从词项在标题和正文中的位置信息及词项的增量文档频率两方面进行优化, 同时在相似度的计算中添加了时间因素及聚类中动态更新话题的质心向量. 应用 基于主题的网络爬虫构建的新闻等语料作为测试数据集, 实验结果表明, 改进算法较传统算法在耗费代价和错检率上分别降低0.34%和1.57%, 验证了改进算法的有效性和准确性.

关 键 词:文本聚类  Single-Pass算法  话题发现  
收稿时间:2016-10-24

Network News Topics Discovery Based on Improved Single-Pass Algorithm
SUN Hongguang,GAO Xing,SUN Tieli,YANG Fengqin,PENG Yang,FENG Guozhong.Network News Topics Discovery Based on Improved Single-Pass Algorithm[J].Journal of Jilin University: Sci Ed,2018,56(1):114-118.
Authors:SUN Hongguang  GAO Xing  SUN Tieli  YANG Fengqin  PENG Yang  FENG Guozhong
Institution:1. School of Information Science and  Technology, Northeast Normal University, Changchun 130117, China;2. Key Lab of Intelligent Information Processing of Jilin Universities, Changchun 130117, China;3. Liberation Army Daily, Beijing 100832, China
Abstract:By improved Single Pass incremental text clustering algorithm, we organized news information with granularity of topics, and achieved the discovery of network news topics. Considering the dynamic and time characteristics of news, the position information of terms in the headlines and texts and the frequency of incremental documents of terms in the feature terms weight calculation were optimized, meanwhile, time factor was added in similarity calculation and the topics centroid vectors were updated dynamically in clustering. Through the topic based Web crawler to construct news corpus as the test data set, the experimental results show that, compared with the traditional algorithm, the improved algorithm reduces the cost and fallout ratio by 0.34% and 1.57% respectively, which verify the validity and accuracy of the improved algorithm.
Keywords:text clustering  Single Pass algorithm  topic discovery  
本文献已被 CNKI 等数据库收录!
点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号