首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
数据挖掘与数据库的集成方法   总被引:5,自引:0,他引:5  
数据挖掘的研究主要集中在挖掘算法上,但在数据库领域至关重要的数据挖掘系统与数据库的有效集成研究却很少,为此,在详细研究了数据挖掘耦合数据库的主要方法(通过SQL(Structured Query Language)游标接口读取数据、保存数据至本地磁盘cache进行挖掘、用存储过程封装挖掘算法、采用用户自定义函数表达挖掘算法以及通过扩展SQL直接操作挖掘模型)的基础上,指出在实现数据挖掘同数据库无缝集成的发展过程中,在现有的DB/DW中集成数据挖掘系统并提供应用程序和自定义挖掘算法的接口、研究推出标准数据挖掘语言是实现数据挖掘系统与数据库有效集成的关键技术。  相似文献   

2.
在借鉴空间数据挖掘技术的基础上,定义了移动对象轨迹之间的时态距离和平均距离,提出了标准差法和置信区间法两种轨迹聚类算法。两种方法能够找出所有具有相似轨迹的对象对,在不同距离采样点数的基础上配合使用两种方法能够明显降低轨迹聚类算法的时间复杂度。基于标准差法和置信区间法的轨迹聚类算法在仿真数据集和真实数据集进行了验证。表明两种方法能够为其他轨迹聚类算法进行数据筛选,筛选后的数据量将大大减少,从而可提高算法效率。  相似文献   

3.
文章总结了数据挖掘的基本方法、文本数据挖掘的关键技术,讨论了文本挖掘的定义和文本分类的一些形式,并对文本数据的数据挖掘算法以及发展趋势进行了研究。  相似文献   

4.
入侵检测中的数据挖掘技术   总被引:3,自引:0,他引:3  
卢辉斌  王拥军 《燕山大学学报》2003,27(4):314-316,351
入侵检测技术是网络安全防护的重要组成部分。在入侵检测系统中采用数据挖掘技术,已经成为现在研究的热点。本文把模糊集理论和传统的关联挖掘结合在一起,提出了一种模糊关联数据挖掘算法。与以往算法的算法相比,提高了运算效率。  相似文献   

5.
关联规则挖掘是数据挖掘的主要技术之一,现有的关联规则挖掘算法均基于支持度-置信度框架,当用户调整阈值时存在多次遍历数据库和重复计算问题。该文针对支持度阈值变化时的关联规则维护问题,提出了关联规则交互挖掘算法HIUA,该算法改进了原始IUA算法的剪枝过程,并通过Hash结构提高算法运行效率。在UCI数据集及企业实际财务数据集中的实验结果表明:在支持度阈值发生变化的过程中HIUA算法进一步利用已有挖掘结果,有效提高了关联规则挖掘的效率。  相似文献   

6.
广义Web内容挖掘模型算法   总被引:2,自引:0,他引:2  
在信息时代的今天,网络以几何速度飞速发展,成为现代人获取信息的主要来源之一.也正因为网络信息增长太快,人们反而面临“信息爆炸”与“知识贫瘠”共存的局面.数据挖掘(DM)是通过数据获取知识的最佳工具,由此,产生了Web数据挖掘,即KDW的概念.本文重点论述广义Web内容挖掘的特点与发展、狭义的内容挖掘中页面内和页面间挖掘的区分及应用的主要算法、结构挖掘中的两大算法及其优劣.  相似文献   

7.
针对传统数据挖掘技术的劣势,提出一种以利润为基础的约束关联规则挖掘算法.在使用关联规则进行数据挖掘之前,算法按照商品利润的权重信息对购物篮中的原始商品交易信息实施预处理,可以使后续的数据关联规则挖掘更加的精确可靠,提升数据挖掘的效果.结果表明:基于利润的约束关联规则挖掘算法对数据库的原始数据实施了利润约束修正,增加了利润加权阈值,可有效提升数据挖掘算法的知识挖掘性能.  相似文献   

8.
AGM算法和HSIGRAM算法是两个经典的频繁子图挖掘算法,在基于图的数据挖掘中有重要的应用.从算法思想和应用技术两个方面分析了AGM算法和HSIGRAM算法的异同点,结合基于图的数据挖掘的特性,提出针对这两个算法的改进策略.  相似文献   

9.
数据挖掘具有计算密集型和存储密集型的特点,中间件技术能够较好的解决这两个问题.研究并实现了典型的分类、聚类、关联规则算法及其增量算法的中间件和数据挖掘企业应用平台,能够处理100 Mbit量级的数据,适应的数据增量在10~100 Mbit量级,并且能够根据不同的挖掘任务实现相应的模式展现与可视化.平台上对某网球训练基地运动员体能训练数据集执行增量聚类挖掘任务,结果表明该平台能较好地满足可靠性、扩展性、易用性等业务需要.  相似文献   

10.
流数据频繁项挖掘是一项重要的研究课题,是其他流数据挖掘任务的基础。Lossy counting 算法是第一个近似的流数据频繁项挖掘的算法,并且具有空间和时间的高效性。详细分析该算法,尤其是它不能回答关于时间的查询的不足后,对其进行改进,提出了一个在多时间粒度上挖掘流数据频繁项的设想,加入时间维度。改进后的算法在时间倾斜窗口保存与合并频繁项,可以应用于各种对时间敏感的流数据查询和挖掘应用中。  相似文献   

11.
SDML:基于空间数据库的空间数据挖掘语言   总被引:6,自引:0,他引:6  
设计了一种基于空间数据库的空间数据挖掘语言SDML.根据SDML操作的对象以及挖掘过程的不同阶段,SDML语言可以分为视图操纵语言和模型操纵语言,分别负责对于数据挖掘视图和模型的操作.详细阐述了SDML的设计思想及其设计方案,针对空间泛化和空间关联这两个典型的空间数据挖掘问题,给出了SDML解决方案.  相似文献   

12.
In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and twodirection association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During twodirection spatial association rules mining, an algorithm is proposed to get nonspatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into nonspatial associations and the nonspatial itemsets were gotten. Based on the nonspatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.  相似文献   

13.
NPSP:一种高效的序列模式增量挖掘算法   总被引:4,自引:3,他引:1  
提出了一种称为“异构树”的数据结构,采用一套编号规则对异构树的分支进行编号,使具有相同编号的分支代表相同的候选序列,编号不同的分支代表不同的候选序列,极大地简化了候选集计数过程,在此基础上提出了具有增量挖掘功能的序列模式高效挖掘算法NPSP,并从理论分析和实验两方面证明了其挖掘结果集的完备性和算法的高效性.  相似文献   

14.
In data mining from transaction DB, the relationships between the attributes have been focused, but the relationships between the tuples have not been taken into account. In spatial database, there are relationships between the attributes and the tuples, and most of the associations occur between the tuples, such as adjacent, intersection, overlap and other topological relationships. So the tasks of spatial data association rules mining include mining the relationships between attributes of spatial objects, which are called as vertical direction DM, and the relationships between the tuples, which are called as horizontal direction DM. This paper analyzes the storage models of spatial data, uses for reference the technologies of data mining in transaction DB, defines the spatial data association rule, including vertical direction association rule, horizontal direction association rule and two-direction association rule, discusses the measurement of spatial association rule interestingness, and puts forward the work flows of spatial association rule data mining. During two-direction spatial association rules mining, an algorithm is proposed to get non-spatial itemsets. By virtue of spatial analysis, the spatial relations were transferred into non-spatial associations and the non-spatial itemsets were gotten. Based on the non-spatial itemsets, the Apriori algorithm or other algorithms could be used to get the frequent itemsets and then the spatial association rules come into being. Using spatial DB, the spatial association rules were gotten to validate the algorithm, and the test results show that this algorithm is efficient and can mine the interesting spatial rules.  相似文献   

15.
Parallel frequent pattern discovery algorithms exploit parallel and distributed computing resources to relieve the sequential bottlenecks of current frequent pattern mining (FPM) algorithms. Thus, parallel FPM algorithms achieve better scalability and performance, so they are attracting much attention in the data mining research community. This paper presents a comprehensive survey of the state-of-the-art parallel and distributed frequent pattern mining algorithms with more emphasis on pattern discovery from complex data (e.g., sequences and graphs) on various platforms. A review of typical parallel FPM algorithms uncovers the major challenges, methodologies, and research problems in the field of parallel frequent pattern discovery, such as work-load balancing, finding good data layouts, and data decomposition. This survey also indicates a dramatic shift of the research interest in the field from the simple parallel frequent itemset mining on traditional parallel and distributed platforms to parallel pattern mining of more complex data on emerging architectures, such as multi-core systems and the increasingly mature grid infrastructure.  相似文献   

16.
一种基于层次聚类的流数据挖掘方法   总被引:1,自引:0,他引:1  
流数据的特点在于数据流快速、有序地到达,并且数据海量,许多应用领域中生成的数据都可以归结为此类型.数据挖掘技术可以从海量的数据中发现有意义的知识模型,传统的数据挖掘算法通常是针对静态数据集,对流数据却无法有效地处理.文章试图从层次聚类角度处理流数据,并探讨了一种基于最小代价函数的层次聚类算法.  相似文献   

17.
随着计算机及相关的信息获取技术的不断进步,各种类型的数据库逐步建立并以越来越低的的成本提供大指量数据。科学研究的重点自然地转向现有数据库中的数据挖掘或者称为隐含信息提取。空间数据因其量大、多维和存在自相关等原因,其数据挖掘较其他数据类型更为复杂,20世纪90年代中期Stan Openshaw认为空间数据挖掘已成为定量地理学中一个重要分支,并以GeoComputation命名这个新的学科。本文讨论了GeoComputation命名这个新的学科。本文讨论了GeoComputation的内容体系和各种定义,并论证其作为一个学科的必要性和合理性。  相似文献   

18.
当前高分辨率视频图像数据挖掘方法容易受到外界环境的干扰,提取的视频图像特征不可靠,且不同视角下提取的特征值有很大差异,导致视频图像数据挖掘精度大大降低。为此,提出一种新的不同视角下海量高分辨率视频图像数据挖掘方法,通过Harris角点检测方法对待挖掘高分辨率视频图像数据时空特征进行提取,依据高分辨率视频图像数据时空特征,通过自相关矩阵建立相同事物不同视角下的递归图,将递归图看作一幅图像,通过计算像素点的梯度向量构建递归特征描述符,对相同事物不同视角下的关联性进行挖掘,将具有相同递归图梯度特征的高分辨率视频图像数据汇聚在一起,实现数据挖掘。实验结果表明,所提方法挖掘精度高。  相似文献   

19.
Recent advances in computing, communications, digital storage technologies, and highthroughput dataacquisition technologies, make it possible to gather and store incredible volumes of data. It creates unprecedented opportunities for largescale knowledge discovery from database. Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for processing large volumes of data, such as data analysis, decision making, etc. There are many researchers working on designing efficient data mining techniques, methods, and algorithms. Unfortunately, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining. In this paper, we will propose a new understanding for data mining, that is, domainoriented datadriven data mining (3DM) model. Some datadriven data mining algorithms developed in our Lab are also presented to show its validity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号