首页 | 本学科首页   官方微博 | 高级检索  
     

融合频繁项集和潜在语义分析的股评论坛主题发现方法
引用本文:张涛,翁康年,顾小敏,张玥杰. 融合频繁项集和潜在语义分析的股评论坛主题发现方法[J]. 同济大学学报(自然科学版), 2019, 47(4): 0583
作者姓名:张涛  翁康年  顾小敏  张玥杰
作者单位:上海财经大学 信息管理与工程学院, 上海 200433;上海市金融信息技术研究重点实验室(上海财经大学), 上海 200433,上海财经大学 信息管理与工程学院, 上海 200433,上海财经大学 信息管理与工程学院, 上海 200433,复旦大学 计算机科学技术学院, 上海 200433;上海市智能信息处理重点实验室(复旦大学), 上海 200433
基金项目:国家自然科学基金资助项目(71171126); 上海市科学技术委员会“科技创新行动计划”资助项目(16511104704); 同济大学青年优秀人才培养计划(1508-219-040).
摘    要:针对股评论坛主题发现,提出基于频繁项集与潜在语义相结合的短文本聚类(STC_FL)框架.在基于知网的知识获取后得到概念向量空间,挖掘并筛选出重要频繁项集,然后采用统计和潜在语义相结合的方法进行重要频繁项集的自适应聚类.最后,提出TSC-SN(text soft classifying based on similarity threshold and non-overlapping)算法,通过参数调优策略选择和控制文本软聚类过程.股吧论坛数据实证分析发现:所提出的STC_FL框架和TSC-SN算法可充分挖掘文本潜在语义信息,并有效降低特征空间维度,最终实现对短文本的深层次信息挖掘和主题归类.

关 键 词:主题发现  股吧论坛  频繁项集  潜在语义分析  文本软聚类
收稿时间:2018-05-01
修稿时间:2019-02-26

Topic Discovery Method of Stock Bar Forum Based on Integration of Frequent Item-set and Latent Semantic Analysis
ZHANG Tao,WENG Kangnian,GU Xiaomin and ZHANG Yuejie. Topic Discovery Method of Stock Bar Forum Based on Integration of Frequent Item-set and Latent Semantic Analysis[J]. Journal of Tongji University(Natural Science), 2019, 47(4): 0583
Authors:ZHANG Tao  WENG Kangnian  GU Xiaomin  ZHANG Yuejie
Affiliation:School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China; Shanghai Key Laboratory of Financial Information Technology (Shanghai University of Finance and Economics), Shanghai 200433, China,School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China,School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China and School of Computer Science, Fudan University, Shanghai 200433, China; Shanghai Key Laboratory of Intelligent Information Processing (Fudan University), Shanghai 200433, China
Abstract:To achieve more effective topic discovery of stock bar forum, this paper presents a framework with short text clustering based on frequent item set and latent semantic (STC_FL). The important frequent item sets are acquired with the concept vector space based on HowNet, and then a combination pattern of statistics and latent semantics is used to realize the self adaptive clustering of important frequent item sets. Finally, the algorithm of text soft classifying based on similarity threshold and non overlapping (TSC SN) is proposed. Text soft clustering is selected and controlled with parameter optimization. By taking the real stock bar forum data as a specific case of empirical analysis, it is shown that STC_FL framework and TSC SN algorithm can fully exploit the latent semantic information of text and reduce the dimension of feature space, which realizes the deep information mining and topic classification of short texts.
Keywords:topic discovery   stock bar forum   frequent item-set   latent semantic analysis   text soft clustering
本文献已被 CNKI 等数据库收录!
点击此处可从《同济大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《同济大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号