首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于语义簇的中文文本聚类算法
引用本文:齐向明,孙煦骄.基于语义簇的中文文本聚类算法[J].吉林大学学报(理学版),2002,57(5):1193-1199.
作者姓名:齐向明  孙煦骄
作者单位:辽宁工程技术大学 软件学院, 辽宁 葫芦岛 125105
摘    要:针对中文文本聚类受语义、 语法、 语境等因素的影响, 在使用传统向量空间模型向量化表征后, 文本向量之间相互独立, 语义关系被忽略, 影响聚类分析结果的问题, 提出一种基于语义簇的中文文本聚类算法. 该算法根据词共现的原理和语义相关性, 首先使用词频-逆向文档频率(TF-IDF)方法求得特征词权重, 利用特征词的搭配向量构建语义簇; 然后使用特征词及其搭配词的权重, 将特征词向语义簇中心进行空间变换, 求得嵌入语义信息的文档向量; 最后利用文档向量进行K-means聚类分析. 实验结果表明, 该向量化表示方法, 能有效提高文本向量对文本语义的逼近能力, 同时可提高文本聚类结果的准确率和召回率.

关 键 词:向量    特征词    语义簇    语义嵌入    聚类分析  
收稿时间:2018-07-11

Chinese Text Clustering Algorithm Based on Semantic Cluster
QI Xiangming,SUN Xujiao.Chinese Text Clustering Algorithm Based on Semantic Cluster[J].Journal of Jilin University: Sci Ed,2002,57(5):1193-1199.
Authors:QI Xiangming  SUN Xujiao
Institution:College of Software, Liaoning Technical University, Huludao 125105, Liaoning Province, China
Abstract:Aiming at the problem that Chinese text clustering wasinfluenced by semantic, grammatical and contextual factors, after using traditional vector space model to quantify representation, text vectors were independentof each other and semantic relations were ignored, which affected the results of clustering analysis, we proposed a Chinese text clustering algorithm based onsemantic cluster. The algorithm is based on the principle of word co occurrence and semanticrelevance. Firstly, termfrequency inverse document frequecy (TF IDF) method was used to obtain the weight of feature words, and the collocation vector of feature words was used to construct semantic clusters. Secondly, by using the weight of feature words and their collocation words, the featurewords were spatially transformed to the semantic cluster center, and the document vector embedded in the semantic information was obtained. Finally, the document vector was used for K-means clustering analysis. The experimental results show that the vectorization method can effectively improve the approximation ability of text vector to text semantics, and improve the accuracy and recall rate of text clustering results.
Keywords:vector  feature word  semantic cluster  semantic embedding  cluster analysis  
点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号