一种新的加权后缀树Web文档聚类方法 Novel Weighted Suffix Tree Clustering for Web Documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种新的加权后缀树Web文档聚类方法

引用本文：	杨瑞龙,朱庆生,谢洪涛,屈洪春. 一种新的加权后缀树Web文档聚类方法[J]. 系统仿真学报, 2011, 23(3): 474-479

作者姓名：	杨瑞龙朱庆生谢洪涛屈洪春

作者单位：	重庆大学计算机学院,重庆,400044

基金项目：	国家科技支撑计划(2007BAH08B04); 重庆市科技支撑计划(2008AC20084)

摘要：	针对Web文档的结构及其特征,提出了一种新的加权后缀树聚类方法WSTC。首先,根据Web文档的HTML标签,把文档划分为具备不同重要性等级的段,段划分成句子,句子分割为词。其次,用句子替代文档构造后缀树,把其重要性等级作为结构权融入后缀树的节点,形成文档集的加权后缀树模型。最后,在选择和合并基类过程中,综合利用节点包含的文档数、句子数、短语长度和结构权。仿真实验表明,WSTC算法比传统STC算法取得了更好的聚类效果。
关键词：	后缀树后缀树聚类 Web文档聚类 Web文档结构权重计算
Novel Weighted Suffix Tree Clustering for Web Documents

YANG Rui-long,ZHU Qing-sheng,XIE Hong-tao,QU Hong-chun. Novel Weighted Suffix Tree Clustering for Web Documents[J]. Journal of System Simulation, 2011, 23(3): 474-479

Authors:	YANG Rui-long ZHU Qing-sheng XIE Hong-tao QU Hong-chun

Affiliation:	YANG Rui-long,ZHU Qing-sheng,XIE Hong-tao,QU Hong-chun(College of Computer Science,Chongqing University,Chongqing 400044,China)

Abstract:	For Web documents clustering,a novel Weighted Suffix Tree Clustering(WSTC) method was proposed.First,according to the structure and HTML tags of Web documents,different parts of documents were assigned different levels of significance as structure weights;each part was partitioned into some sentences which were partitioned into some words.Second,the weighted suffix tree of documents set was built with sentences and structure weights stored in the nodes.Finally,the documents count,sentences count,phrase leng...

Keywords:	suffix tree suffix tree clustering web document clustering web document structure weight computing
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏