首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Web日志的高精度聚类算法
引用本文:金松河,钱慎一,张素智.基于Web日志的高精度聚类算法[J].河南科技大学学报(自然科学版),2006,27(2):49-51.
作者姓名:金松河  钱慎一  张素智
作者单位:郑州轻工业学院,计算机与通信工程学院,河南,郑州,450002
基金项目:河南省自然科学基金项目(0411010500)
摘    要:提出一种Web日志挖掘算法,该算法首先以Web站点的URL为行、以用户的UserID为列,建立URL- UserID关联矩阵,元素值为用户的访问次数;然后,对行向量进行相似性度量获得用户会话粗聚类,最后,利用层次结构对比聚类算法,对用户会话粗聚类进行进一步地处理得到更高精度的聚类,实验表明该算法在提高聚类精度方面卓有成效。

关 键 词:网络  Web日志挖掘  会话聚类  结构层次
文章编号:1672-6871(2006)02-0049-03
收稿时间:2005-11-17
修稿时间:2005年11月17

High Precision Clustering Algorithm Based on Web Log
JIN Song-He,QIAN Shen-Yi,ZHANG Su-Zhi.High Precision Clustering Algorithm Based on Web Log[J].Journal of Henan University of Science & Technology:Natural Science,2006,27(2):49-51.
Authors:JIN Song-He  QIAN Shen-Yi  ZHANG Su-Zhi
Abstract:Similar customer groups, relevant Web pages and frequent access paths can be discovered by analyzing Web log files. A Web log mining algorithm is presented here. Firstly, according to Web site' s directed graph defined, a URL-UserID relevant matrix is set up, with URL as row and UserID as column, and users times of visiting as element values. Secondly, rough session clusters are obtained by measuring similarity between row vectors. Finally, by dealing with the rough session clusters further through hierarchy comparison clustering algorithm, clusters with higher precision can be acquired. Experiments prove the effectiveness of the algorithm.
Keywords:Networks  Web log mining  Session clustering  Structure hierarchy
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号