首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于会话聚类算法的Web使用挖掘方法
引用本文:陈富赞,刘青,李敏强,寇纪淞. 一种基于会话聚类算法的Web使用挖掘方法[J]. 系统工程学报, 2012, 27(1): 129-136
作者姓名:陈富赞  刘青  李敏强  寇纪淞
作者单位:天津大学管理与经济学部,天津,300072
基金项目:国家自然科学基金资助项目(71101103;70925005;61074152;70771074);教育部博士点专项科研基金资助项目(20100032120086;20090032110065;20100032110036)
摘    要:Web使用挖掘作为数据挖掘的一个重要任务,有助于了解用户群体的特征,从而为其提供个性化服务.提出了一种基于用户会话聚类的Wei使用挖掘算法.首先,对Web日志预处理采用基于时间窗的用户会话识别方法,提出了一种基于三元组的用户会话表示方法,并在此基础上给出了基于网页语义相似性的会话处理方法,该方法能够在保持用户兴趣不变的情况下有效降低会话维度;其次,提出了一种基于时间及频次的用户会话相似性度量方法;最后,设计了一种两阶段PS-KM会话聚类算法,先用PSO方法进行全局搜索再转入基于K-means方法的局部聚类过程.仿真表明了算法的有效性.

关 键 词:Web使用挖掘  Web日志  用户会话  聚类

A novel web usage mining method based on web session clustering
CHEN Fu-zan , LIU Qing , LI Min-qiang , KOU Ji-song. A novel web usage mining method based on web session clustering[J]. Journal of Systems Engineering, 2012, 27(1): 129-136
Authors:CHEN Fu-zan    LIU Qing    LI Min-qiang    KOU Ji-song
Affiliation:(Colledge of Management and Economics,Tianjin University,Tianjin 300072,China )
Abstract:Web usage mining has been an important task of data mining.It helps to understand the user group’s identity,thus provides personalized service.A novel web usage mining algorithm based on the user sessions clustering is proposed in this paper.Firstly,a time-based user session identification method is used for Web log preprocessing.Furthermore,a 3-tuple data structure is designed to represent web sessions,and a session dimensionality reduction method based on web page semantic similarity is proposed,which could deduce the length of the session effectively with user’s interest retaining.Secondly,a new session similarity measure is designed based on both time and frequency.Finally,a two-stage PS-KM session clustering algorithm is proposed.The algorithm first uses PSO method to make a global search,and then uses the local clustering process based on the K-means method.Experimental results show that the algorithm has highly effective.
Keywords:web usage mining  web log  user session  clustering
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号