首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种改进型TF-IDF文本聚类方法
引用本文:张蕾,姜宇,孙莉.一种改进型TF-IDF文本聚类方法[J].吉林大学学报(理学版),2021,59(5):1199-1204.
作者姓名:张蕾  姜宇  孙莉
作者单位:1. 吉林大学 发展规划处, 长春 130012; 2. 吉林大学 计算机科学与技术学院, 长春 130012
摘    要:针对传统词频 逆文档频率(TF-IDF)算法对具有特定属性的文本分类存在的不足, 尤其是词汇在特定分类中具有特殊意义情形下准确率较低的问题, 提出一种改进的TF-IDF文本聚类算法. 采用2015—2019年吉林省科研机构发表论文数据进行对比实验, 分别用改进TF-IDF算法和传统TF-IDF算法先统计论文中的关键词词频, 再通过K-means++算法进行聚类, 最后使用随机森林算法分别评估聚类的准确性. 实验结果表明, 改进TF-IDF算法提高了分类的准确率.

关 键 词:词频-逆文档频率(TF-IDF)    混合聚类    交叉学科    基本科学指标数据库(ESI)文献  
收稿时间:2020-11-10

An Improved TF-IDF Text Clustering Method
ZHANG Lei,JIANG Yu,SUN Li.An Improved TF-IDF Text Clustering Method[J].Journal of Jilin University: Sci Ed,2021,59(5):1199-1204.
Authors:ZHANG Lei  JIANG Yu  SUN Li
Institution:1. Division of Development and Strategic Planning, Jilin University, Changchun 130012, China;
2. College of Computer Science and Technology, Jilin University, Changchun 130012, China
Abstract:Aiming at the shortcomings of traditional term frequency-inverse document frequency (TF-IDF) algorithm for text classification with specific attributes, especially the low accuracy of words with specific meaning under specific classification, we proposed an improved TF-IDF text clustering algorithm. Comparative experiments were carried out through the papers published by scientific research institutions in Jilin Province from 2015 to 2019. The improved TF-IDF algorithm and the traditional TF-IDF algorithm were used to calculate the frequency of keywords in the papers, then K-means++ method was used to cluster. Finally, random forest algorithm was used to evaluate the accuracy of clustering. The experimental results show that the improved TF-IDF algorithm improves the accuracy of classification.
Keywords:term frequency-inverse document frequency (TF-IDF)  hybrid clustering  interdisciplinary  essential science indicators (ESI) literature  
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号