首页 | 本学科首页   官方微博 | 高级检索  
     检索      

改进的基于后缀树的Web搜索结果聚类算法
引用本文:董亚则,李万龙,李航,郑山红.改进的基于后缀树的Web搜索结果聚类算法[J].吉林大学学报(信息科学版),2016,34(4):543-549.
作者姓名:董亚则  李万龙  李航  郑山红
作者单位:长春工业大学应用技术学院,长春,130012;长春工业大学计算机科学与工程学院,长春,130012
基金项目:吉林省自然科学基金资助项目(20130101060JC),吉林省教育厅“十二五”科学技术研究基金资助项目(2014125
摘    要:为提高Web 搜索精度和检准率, 在后缀树聚类算法基本模型的基础上, 提出了一种改进的基于后缀树的搜索结果聚类算法。将向量空间模型与后缀树聚类相结合, 改善了基类合并的效果, 综合基类节点对应文本数、短语包含词语长度、短语权重及是否包含查询词作为聚类标签的筛选条件, 改进了聚类标签的合理性和可读性。以搜狗语料库中的文本分类语料库为数据源进行的实验结果表明, 该方法在一定程度上提高了聚类结果的准确率。

关 键 词:文本聚类  后缀树  向量空间模型  Web检索结果
收稿时间:2015-12-17

Improved Algorithm of Web Retrieve Results Clustering Based on Suffix Tree
DONG Yaze,LI Wanlong,LI Hang,ZHENG Shanhong.Improved Algorithm of Web Retrieve Results Clustering Based on Suffix Tree[J].Journal of Jilin University:Information Sci Ed,2016,34(4):543-549.
Authors:DONG Yaze  LI Wanlong  LI Hang  ZHENG Shanhong
Institution:a. School of Application Technology; b. School of Computer Science & Engineering,
Changchun University of Technology, Changchun 130012, China
Abstract:How to improve the accuracy and precision of search engine in the Internet Era is the key problem needed to be solved urgently. Based on the basic model of the suffix tree clustering algorithm, an improved search results clustering algorithm based on suffix tree is proposed, in which Vector space model is combined with suffix tree clustering to improve the effect of the base class merge. Otherwise, the number of the texts corresponding to base class node, word length included in the phrase, phrase weight and whether it contains the query terms are combined as the seletion condition of clustering label. It improves the rationality and readability of the clustering labels consquently. Finally, the method is testified by using the text classification corpus data in the Sogou corpus. The experimental results show that the method can improve the accuracy of clustering results to a certain extent.
Keywords:text clustering  suffix tree  vector space model  Web retrieval results
本文献已被 万方数据 等数据库收录!
点击此处可从《吉林大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(信息科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号