A Chinese Web page clustering algorithm based on the suffix tree期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A Chinese Web page clustering algorithm based on the suffix tree

Authors:	Email author" target="_blank">Yang?Jian-wu Email author

Institution:	(1) National Key Laboratory for Text Processing, Institute of Computer Science and Technology, Peking University, 100871 Beijing, China

Abstract:	In this paper, an improved algorithm, named STC\\|I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page.

Keywords:	clustering suffix tree Web mining
本文献已被 CNKI 维普万方数据 SpringerLink 等数据库收录！