文章摘要
朱志国.基于URL语义分析的Web用户会话识别方法[J].,2011,(3):440-446
基于URL语义分析的Web用户会话识别方法
A method for Web user session identification based on URL semantic analysis
  
DOI:10.7511/dllgxb201103022
中文关键词: 数据挖掘  Web使用挖掘  数据预处理  用户会话识别
英文关键词: data mining  Web usage mining  data preprocessing  user session identification
基金项目:国家自然科学基金资助项目70671016.
作者单位
朱志国  
摘要点击次数: 1305
全文下载次数: 1194
中文摘要:
      由于现有基于时间和引用的经典会话识别方法在复杂Web使用模式挖掘中存在局限性,提出了一个基于URL语义分析的用户会话识别新方法.这个方法借助Web目录服务,将Web日志中的每一条URL记录赋予一定的语义信息,并给出一些测度指标对URL之间的语义相似度进行评价.对静态和流动两类Web日志情况进行分析,分别给出了语义奇异值鉴别方法SOA s和SOA d对用户会话进行切分识别.最后对提出的方法与现有经典方法进行了比较实验与分析,结果表明会话识别的精确率和召回率有所提高.
英文摘要:
      Because classical session identification methods based on timeout-oriented and referrer-based heuristics are restricted to discover complex patterns in Web usage mining, a new method based on URL semantic analysis to identify user sessions is presented. Every URL in Web log files is given a centain semantic information with the aid of Web directory in this method and then some factors are defined to measure the semantic distance between URLs. According to static and dynamic Web logs, two semantic outliers detection methods — SOA s and SOA d, are presented respectively to segment user sessions. Finally, some comparison experiments between classical session identification method and the proposed method are conducted, and the results show that the precision ratio and recall ratio of session identification are increased.
查看全文   查看/发表评论  下载PDF阅读器
关闭