首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于邻域粗糙集的文本主题特征提取
引用本文:靳红伟,谢珺,续欣莹.基于邻域粗糙集的文本主题特征提取[J].科学技术与工程,2019,19(22):208-214.
作者姓名:靳红伟  谢珺  续欣莹
作者单位:太原理工大学信息与计算机学院,晋中,030600;太原理工大学电气与动力工程学院,太原,030024
基金项目:山西省回国留学人员科研项目(2015-045)
摘    要:LDA主题模型是一种有效的文本语义信息提取工具,利用在文档层中实现词项的共现,将词项矩阵转化为主题矩阵,得到主题特征;然而在生成文档过程中会蕴含冗余主题。针对LDA主题模型提取主题特征时存在冗余的不足,提出一种基于邻域粗糙集的LDA主题模型约简算法NRS-LDA。利用邻域粗糙集构造主题决策系统,通过预先设定主题个数,计算出每个主题的重要度;根据重要度进行排序,将排序后重要度低的主题删除。将提出的NRS-LDA算法应用于K-means文本聚类问题上并与传统的文本特征提取算法及改进的算法进行比较,结果表明NRS-LDA方法可以得到更高的聚类精度。

关 键 词:LDA主题模型  邻域粗糙集  文本特征提取  主题约简
收稿时间:2019/1/27 0:00:00
修稿时间:2019/4/28 0:00:00

Research on Text Topic Feature Extraction Based on Neighborhood Rough Set
JIN Hong-wei,and XU Xin-ying.Research on Text Topic Feature Extraction Based on Neighborhood Rough Set[J].Science Technology and Engineering,2019,19(22):208-214.
Authors:JIN Hong-wei  and XU Xin-ying
Institution:Taiyuan University of Technology College of Information and Computer,,
Abstract:LDA topic model is an effective tool for text feature extraction. Although the topic feature is obtained through the co-occurrence of the term in the document level, which transfers the term space into the topic space, the redundant topic is included in the process of generating the document. As to the redundant topic shortage during topic feature extraction by LDA, this paper proposes an LDA topic model reduction algorithm NRS-LDA based on neighborhood rough set. Use the neighborhood rough set to construct the topic decision system. By pre-setting the number of topics, calculate the importance of each topic; rank according to the importance degree and delete the topics of low importance. Apply the NRS-LDA algorithm to the K-means text clustering problem and compare it with the traditional extraction algorithm of text feature and with the improved algorithm. The experimental results show that the proposed NRS-LDA method can obtain higher clustering accuracy.
Keywords:LDA  topic model  neighborhood rough  set    text  feature extraction  topic reduction
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号