首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Web日志预处理中会话识别的优化   总被引:3,自引:0,他引:3  
针对目前的各种会话识别方法,提出了一种优化的会话切分方法.该方法基于对用户下载时间、对页面的平均阅读时间及页面的链入、链出数等几个参数的综合,得到每个用户页面的访问时间阈值,根据该阈值来切分用户会话,得到会话侯选集合;然后,根据用户对页面内容的兴趣度、浏览特性等来删除会话中的链接页面和不感兴趣的页面,生成一种最终有效的访问页面序列,从而为以后的模式发现提供良好的数据.实验结果表明,相对于所有用户使用单一先验阈值和使用统计方法结合页面内容确定阈值的方法,笔者提出的方法能更准确地确定页面访问时间阈值,得到更为合理有效的会话集合.  相似文献   

2.
通过对传统web会话识别方法分析和比较,改进了目前最常用的基于时间阈值会话识别方法,提出了一种基于动态阈值会话识别方法,该算法采用动态计算会话中请求记录间的平均时间间隔和动态计算会话中页面的平均大小相结合的方法,根据用户和网页的特点动态调整阈值,相对于传统单一的先验阈值,该方法可以根据不同的用户访问不同的页面生成动态的阈值,充分运用用户和网页信息.经过实验验证,该方法可以识别出更多的用户会话,且识别会话的准确率和查全率也比传统算法更高.  相似文献   

3.
在常用的计算时间阈值识别方法的基础上,提出了一种改进的基于URL页面类型、页面信息量和访问时间的平均阈值识别方法.针对不同的页面类型采用不同的阈值计算方法设置时间阈值,相对于已有的对所有用户页面使用单一的先验阈值和现有动态阈值计算,该方法能够更真实地反映用户会话的情况,且识别的准确率有了较大提高.  相似文献   

4.
提出了基于web拓扑结构和访问页面兴趣度动态确定时间间隔的算法,将其应用到网络日志数据预处理研究中的会话识别阶段.以反向代理服务器采集的网络日志作为数据来源进行实验,结果表明该算法同其他会话识别算法相比,在精确度和完整度上有了很大的提高;能够有效保留用户访问校内网的日志数据特征,为后面完成推荐或者决策提供一个良好的基础.  相似文献   

5.
网络日志挖掘中基于时间间隔的会话切分   总被引:10,自引:0,他引:10  
针对网络日志挖掘中的会话切分问题,提出了一种基于时间间隔的方法。该方法在相邻页面访问时间间隔超出某阈值时切分会话,针对特定IP的阈值根据其频率矢量来定义。实验表明:代理服务器IP和单用户IP的频率矢量具有不同特性,代理服务器IP的频率矢量具有Power-law的特点,而单用户IP的频率矢量具有Gauss分布的特点,在此基础上提出一种基于Gauss假设的方法来设定不同单用户IP的阈值。与传统的对所有IP地址使用单一的先验阈值进行切分的方法相比,该方法更为合理有效。  相似文献   

6.
在分析现有的Web访问挖掘数据预处理模型和会话识别算法的基础上,提出了一种改进的Web访问挖掘数据预处理模型并对基于时间和引用的启发式会话识别算法进行了改进。实验证明,改进的Web访问挖掘数据预处理模型和会话识别算法非常适合于当前搜索引擎广泛使用下的Web访问挖掘数据预处理。  相似文献   

7.
Web日志挖掘中数据预处理技术研究   总被引:2,自引:0,他引:2  
Web日志挖掘的基本思想是将数据挖掘技术应用于Web日志数据源。在数据挖掘研究领域中,数据预处理起着至关重要的作用。Web日志挖掘的数据源最主要的是Web日志,根据Web日志的特点,在预处理过程中的会话识别阶段给出一种基于过滤框架网页与页面访问时间阈值相结合的会话识别方法,实验数据验证说明该方法可以显著地提高Web日志挖掘结果的兴趣性。  相似文献   

8.
陈红丽 《科学技术与工程》2012,12(8):1928-1930,1935
数据预处理在Web日志挖掘过程中起着至关重要的作用,直接影响日志挖掘的质量和结果。 文中分析了数据预处理的主要过程,并用站点首页结合动态时间阈值的方法对会话识别进行了改进。实验结果表明, 改进后的会话识别方法能更有效的识别出用户的真实会话。  相似文献   

9.
提出一种利用网页特征进行会话识别的方法.通过分析网页本身的特征,计算站点中所有网页的特征向量.根据这些特征向量,可以计算任意网页之间的相关程度.按照用户请求页面在日志中的时间顺序,可以得到日志中所有直接相邻的页面记录的关联程度曲线.通过设定一个阈值,在关联程度曲线中波动较大的位置形成会话边界.将关联程度大的页面分类到一个会话中,从而完成会话识别.  相似文献   

10.
本文面向出行GPS轨迹中停驻点的识别问题,提出一种基于最小覆盖圆的时空聚类方法。使用停驻范围阈值对轨迹点进行聚类,使用停驻时间阈值对聚类类簇进行初步过滤,使用类簇近邻距离与类簇近邻时间两个阈值对预过滤类簇进行合并,继而使用停驻时间阈值进行最终过滤得到停驻时段与停驻点。该算法改进了已有时空聚类算法中初始类簇的确定方法,提高了计算效率。由于现有的查全率与查准率无法准确衡量停驻点识别结果的精度,基于停驻时段精确度对查全率与查准率计算计算方法进行修改。使用包含9 923个轨迹点的轨迹进行算法有效性检验,轨迹中包含的三个停驻时段均得到有效识别,查准率与查全率均为0.82,实验结果表明,该算法在轨迹重合度高以及轨迹漂移等情形下具有较高的准确性。  相似文献   

11.
何延凌 《科技信息》2008,(4):258-258
Language is a means of verbal communication. People use language to communicate with each other. In the society, no two speakers are exactly alike in the way of speaking. Some differences are due to age, gender, statue and personality. Above all, gender is one of the obvious reasons. The writer of this paper tries to describe the features of women's language from these perspectives: pronunciation, intonation, diction, subjects, grammar and discourse. From the discussion of the features of women's language, more attention should be paid to language use in social context. What's more, the linguistic phenomena in a speaking community can be understood more thoroughly.  相似文献   

12.
In the 19th century the society was controlled by men, and women were just appendants of them, they had not any rights and freedom. But Jane was an exception, she showed some characteristics of early feminist. Jane showed her characteristics of feminism in three aspects: rebellion, equality, and independence. These characteristics were helpful to her success, and feminism is the only way out for women of that time.  相似文献   

13.
There are numerous geometric objects stored in the spatial databases. An importance function in a spatial database is that users can browse the geometric objects as a map efficiently. Thus the spatial database should display the geometric objects users concern about swiftly onto the display window. This process includes two operations:retrieve data from database and then draw them onto screen. Accordingly, to improve the efficiency, we should try to reduce time of both retrieving object and displaying them. The former can be achieved with the aid of spatial index such as R-tree, the latter require to simplify the objects. Simplification means that objects are shown with sufficient but not with unnecessary detail which depend on the scale of browse. So the major problem is how to retrieve data at different detail level efficiently. This paper introduces the implementation of a multi-scale index in the spatial database SISP (Spatial Information Shared Platform) which is generalized from R-tree. The difference between the generalization and the R-tree lies on two facets: One is that every node and geometric object in the generalization is assigned with a importance value which denote the importance of them, and every vertex in the objects are assigned with a importance value,too. The importance value can be use to decide which data should be retrieve from disk in a query. The other difference is that geometric objects in the generalization are divided into one or more sub-blocks, and vertexes are total ordered by their importance value. With the help of the generalized R-tree, one can easily retrieve data at different detail levels.Some experiments are performed on real-life data to evaluate the performance of solutions that separately use normal spatial index and multi-scale spatial index. The results show that the solution using multi-scale index in SISP is satisfying.  相似文献   

14.
A hierarchical equations of motion(HEOM)approach is developed for general open quantum systems coupled to fermionic environment.The HEOM method is in principle formally exact,as it resolves nonperturbatively the combined effects of many-body interaction,system-bath dissipation,and non-Markovian memory.In practice,the HEOM approach is highly accurate and efficient for the characterization of strongly correlated quantum impurity sys-  相似文献   

15.
The non-orthogonal localized molecular orbital(NOLMO)is the most localized representation of electronic degrees of freedom.As such,NOLMOs are thus potentially the most efficient for linear scaling calculations of electronic structures for large systems.However,direct ab initio calculations with NOLMO have not been fully implemented and widely used,partly because of the slow convergence issue in the optimization of NOLMO.We devel-  相似文献   

16.
The concept of nanopore analysis, using the pore-forming protein a-hemolysin to detect individual nucleic acids at a single-molecule level, was first proposed in 1996. Over the past two decades, tremendous progress has been made in the nanopore field, and nanopore analysis has become a label-free and high-throughput method for probing bio- molecules and other analytes with single-molecule sensi- tivity, especially holds the promising for "third generation" DNA sequencing. However, challenges still remain in the experimental strategies and the design of whole nanopore-based instruments. Here, we proudly present a special topic dedicated to the topic of "Nanopore Analysis", with 8 reviews/articles providing up to date coverage of the experimental strategies, theoretical calcu- lations and simulations, and instrument design. Reviews and articles on the experimental strategies cover control of DNA partitioning into a nanopore, detection of target DNA, and the advantages of nanopore-based DNA sequencing. The theoretical calculations and simulations discuss the translocation behavior of DNA, and an inte- grated measurement system and data analysis software are presented for instrument design.  相似文献   

17.
1 Rise of studies on climate change's effects on biodiversity
Until the 1980s, climate change and biodiversity were studied as two independent disciplines for more than a century. In 1992, the Ecological Society of America's annual report named climate change, biodiversity, and the sustainable ecological system as the three major global environmental issues of the twenty-first century [1].  相似文献   

18.
<正>1 Introduction Endeavours on learning methods have long occupied the mind spaces of scholars and there have been theories abound on how people learn and in what way organisations can use some of the knowledge gained in the art of learning to maximise human productivity.While the theories by themselves are not solutions to problems,they are steps of inquiry that provide us with interesting perspectives that we can use to extend our knowledge of the dimensions of learning.This paper is an exploratory work on the theories postulated by three notable scholars including Zimmerman  相似文献   

19.
给出了利用双频观测值计算L3组合电离层延迟高阶项改正的方法,并与全球电离层延迟文件的改正效果进行对比.利用赤道附近的15个国际全球卫星导航定位系统服务组织(IGS)站的数据进行比较,结果表明:2种方法计算的电离层二阶项延迟互差最大不超过1 cm,三阶项延迟互差最大不超过5 mm;电离层高阶项改正后的观测值精密单点定位(PPP)解算结果N、E、U方向互差平均值分别为0.4、0.5、1.0mm,因此2种改正方法效果在同一水平.  相似文献   

20.
本文我们应用有理混合吸引子条件去证明具非唯一不动点的映射Ciric型的若干不动点定理.结果推广并改进已知的一些结果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号