首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Graph clustering, i.e., partitioning nodes or data points into non-overlapping clusters, can be beneficial in a large varieties of computer vision and machine learning applications. However, main graph clustering schemes, such as spectral clustering, cannot be applied to a large network due to prohibitive computational complexity required.While there exist methods applicable to large networks, these methods do not offer convincing comparisons against known ground truth. For the first time, this work conducts clustering algorithm performance evaluations on large networks(consisting of one million nodes) with ground truth information. Ideas and concepts from game theory are applied towards graph clustering to formulate a new proposed algorithm, Game Theoretical Approach for Clustering(GTAC). This theoretical framework is shown to be a generalization of both the Label Propagation and Louvain methods, offering an additional means of derivation and analysis. GTAC introduces a tuning parameter which allows variable algorithm performance in accordance with application needs. Experimentation shows that these GTAC algorithms offer scalability and tunability towards big data applications.  相似文献   

2.
基于势能的快速凝聚层次聚类算法使用一种全新的相似性度量准则,可以更高效地得到聚类结果。针对该算法无法有效处理含噪声的复杂流形数据的缺陷,提出噪声环境下复杂流形数据的势能层次聚类算法。通过势能递增曲线识别噪声点,在新定义的势能最大、最小2层数据上进行自动聚类,以确定类簇的大体框架,并在此基础上对整个数据集进行层次聚类。人工数据集上的实验表明,新算法可以有效处理噪声环境下复杂流形数据;真实数据集上的实验表明,新算法具有更优的聚类效果。  相似文献   

3.
将模糊协方差距离测度引入到竞争学习型神经网络的参数控制中,采用批处理的网络学习方式消除数据样本顺序对网络权重调整的影响,通过淘汰及合并数据集的冗余类实现对未知类别数、多种分布型数据的自适应聚类.实验表明,新网络对数据集的分布形式有较强的鲁棒性,并能正确确定数据集的类别数.  相似文献   

4.
Performing analytics on the load curve(LC) of customers is the foundation for demand response which requires a better understanding of customers ' consumption pattern(CP) by analyzing the load curve.However,the performances of previous widely-used LC clustering methods are poor in two folds:larger number of clusters,huge variances within a cluster(a CP is extracted from a cluster),bringing huge difficulty to understand the electricity consumption pattern of customers.In this paper,to improve the performance of LC clustering,a clustering framework incorporated with community detection is proposed.The framework includes three parts:network construction,community detection,and CP extraction.According to the cluster validity index(CVI),the integrated approach outperforms the previous state-of-the-art method with the same amount of clusters.And the approach needs fewer clusters to achieve the same performance measured by CVI.  相似文献   

5.
随着现代档案管理数据量的不断增长,有效地对档案文本进行聚类划分能够提升档案分类和检索的效率。文中提出2种增量多模态文本数据聚类方法,通过对文本内容进行多视角分析,融合挖掘文本的潜在主题特征,提升文本聚类的准确性。此外,设计文本聚类多模态增量学习模型,提升海量、动态文本划分的效率。在文本数据集上的实验结果表明,文中提出的增量多模态文本聚类方法优于单模态和多模态聚类算法,能够对文本数据进行有效划分。  相似文献   

6.
To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.  相似文献   

7.
针对批量钻削工序质量检测问题,采用声发射传感器采集工序加工过程中的声发射信号,提取其时域统计特征,构造工序过程信号的特征向量,根据密度带噪声的空间增量聚类算法(InDBSCAN)对工序过程中的声发射信号特征向量进行增量聚类,以分析批量工序质量.考虑到插入数据点在促成新类创建的同时可能引起已存在的不同类合并的情况,改进InDBSCAN算法.实验结果表明:改进的InDBSCAN算法使插入数据点的增量聚类更加合理,工序质量分布状况检测准确率达84.03%.  相似文献   

8.
提出了一种基于聚类的支持向量机增量学习算法.先用最近邻聚类算法将训练集分成具有若干个聚类子集,每一子集用支持向量机进行训练得出支持向量集;对于新增数据首先聚类到相应的子集,然后计算其与聚类集内的支持向量之间的距离,给每个训练样本赋以适当的权重;而后再建立预估模型.此算法通过钢材力学性能预报建模的工业实例研究,结果表明:与标准的支持向量回归算法相比,此算法在建模过程中不仅支持向量个数明显减少,而且模型的精度也有所提高.  相似文献   

9.
一种基于名词短语的检索结果多层聚类方法   总被引:2,自引:0,他引:2  
为了对检索结果获取高质量的聚类效果,提取名词短语作为候选类别标签,根据候选类别标签分布情况生成基础类,再使用具有线性时间复杂度的一趟聚类算法对基础类进行多层聚类。与NEC,STC和Lingo算法的对比实验表明:该方法在类别标签的可读性、有效性以及聚类性能上都优于以上3种方法。  相似文献   

10.
为了发现分布式数据流环境下的微簇,针对数据流的遗忘特性,提出一种基于时间衰减的数据流聚类算法.根据衰减模型增量式的处理局部站点,将局部模型发送给中心站点.中心站点对局部站点的微簇进行合并,生成全局聚类模型.通过真实数据和仿真数据的实验表明,该算法能够得到较好的聚类质量,并且有较好的伸缩性.  相似文献   

11.
聚类分析要求较高聚类质量和快速响应能力,各行业数据仓库中的大量、高维数据对算法的效率提出了更大的挑战.CURE算法能够提供高质量聚类结果但不满足联机聚类要求.结合数据仓库数据不定期批量、增量更新的特点,提出了一种新的增量式CURE聚类算法——InCURE,利用对象的互连性和近似度,保持原算法的动态聚类特性的同时大大缩短聚类时间.5维、20维、50维的大量数据实际测试表明无论低维还是高维数据,InCURE都比CURE具有更高的效率,适合数据仓库环境下的增量式聚类分析.  相似文献   

12.
给出了一种新的处理海量数据的聚类算法WIDE(window-density clustering algorithm).它通过网格方法将数据之间的相互关联局部化,通过窗口技术来提高算法的效率,通过密度方法提高聚类的精度.以窗口为中介将网格方法和密度方法融合在一起是算法的主要思想.在此基础上对算法进行了扩展,在功能方面实现了混合型数据聚类、含障碍物数据聚类和增量数据聚类;在速度方面实现了分布式并行聚类.WIDE算法能够在局域网中的多台计算机上并行工作,效率高,计算复杂度为O(N),且能够发现任意形状的聚类,对噪声不敏感.  相似文献   

13.
针对聚类算法在入侵检测应用中存在的参数预设、聚类有效性评价、未知攻击类型检测等问题,提出了一种基于密度和最优聚类数的改进算法,根据样本的分布情况启发式地确定初始聚类中心,从样本的几何结构角度提出一种新的内部评价指标,给出了最优聚类数确定方法,在此基础上,设计了一个增量式的入侵检测模型,实现了聚类中心和聚类数目的动态调整.实验结果表明,与K-means及其他两种改进聚类算法相比,新算法收敛速度更快、聚类准确率更高,能够对未知网络行为进行有效聚类,具有较好的入侵检测效果.  相似文献   

14.
At present, studies on training algorithms for support vector machines (SVM) are important issues in the field of machine learning. It is a challenging task to improve the efficiency of the algorithm without reducing the generalization performance of SVM. To face this challenge, a new SVM training algorithm based on the set segmentation and k-means clustering is presented in this paper. The new idea is to divide all the original training data into many subsets, followed by clustering each subset using k-means clustering and finally train SVM using the new data set obtained from clustering centroids. Considering that the decomposition algorithm such as SVMlight is one of the major methods for solving support vector machines, the SVMlight is used in our experiments. Simulations on different types of problems show that the proposed method can solve efficiently not only large linear classification problems but also large nonlinear ones.  相似文献   

15.
Social media like Twitter who serves as a novel news medium and has become increasingly popular since its establishment. Large scale first-hand user-generated tweets motivate automatic event detection on Twitter. Previous unsupervised approaches detected events by clustering words. These methods detect events using burstiness,which measures surging frequencies of words at certain time windows. However,event clusters represented by a set of individual words are difficult to understand. This issue is addressed by building a document-level event detection model that directly calculates the burstiness of tweets,leveraging distributed word representations for modeling semantic information,thereby avoiding sparsity. Results show that the document-level model not only offers event summaries that are directly human-readable,but also gives significantly improved accuracies compared to previous methods on unsupervised tweet event detection,which are based on words/segments.  相似文献   

16.
Fu M  Yu X  Lu J  Zuo Y 《Nature》2012,483(7387):92-95
Many lines of evidence suggest that memory in the mammalian brain is stored with distinct spatiotemporal patterns. Despite recent progresses in identifying neuronal populations involved in memory coding, the synapse-level mechanism is still poorly understood. Computational models and electrophysiological data have shown that functional clustering of synapses along dendritic branches leads to nonlinear summation of synaptic inputs and greatly expands the computing power of a neural network. However, whether neighbouring synapses are involved in encoding similar memory and how task-specific cortical networks develop during learning remain elusive. Using transcranial two-photon microscopy, we followed apical dendrites of layer 5 pyramidal neurons in the motor cortex while mice practised novel forelimb skills. Here we show that a third of new dendritic spines (postsynaptic structures of most excitatory synapses) formed during the acquisition phase of learning emerge in clusters, and that most such clusters are neighbouring spine pairs. These clustered new spines are more likely to persist throughout prolonged learning sessions, and even long after training stops, than non-clustered counterparts. Moreover, formation of new spine clusters requires repetition of the same motor task, and the emergence of succedent new spine(s) accompanies the strengthening of the first new spine in the cluster. We also show that under control conditions new spines appear to avoid existing stable spines, rather than being uniformly added along dendrites. However, succedent new spines in clusters overcome such a spatial constraint and form in close vicinity to neighbouring stable spines. Our findings suggest that clustering of new synapses along dendrites is induced by repetitive activation of the cortical circuitry during learning, providing a structural basis for spatial coding of motor memory in the mammalian brain.  相似文献   

17.
数据挖掘中一种高效的聚类通用框架研究   总被引:1,自引:1,他引:0  
随着传感器和互联网技术高速发展,数据集的规模激增,但系统的存储和处理能力仍然滞后。针对目前的数据聚类算法所需的测量值数目较多、时间开销大的不足,为了高效地解决大型数据集中的数据聚类问题,提出了一种主动式分层聚类通用框架,通过在小型数据集上重复运行离线聚类算法,既保证了算法性能,又降低了测量值计算复杂度和运行时间复杂度。然后,基于谱聚类算法讨论了本文框架,理论分析结果表明,利用O(n lg2n)个相似性数据可以恢复规模为Ω(lgn)的所有聚类,对包含n个对象的数据集,其运行时间为O(n lg3n)。最后,通过全面的仿真实验,证明了所提框架的其他优异性能。  相似文献   

18.
为了更好地在复杂多目标环境下进行汽车雷达数据的实时聚类,使用扩展卡尔曼滤波算法(EKF)对基于密度的聚类算法(DBSCAN)进行改进,并通过仿真和实测实验进行验证。结果表明:新算法在进行增量聚类时每次耗时可以保持在一个稳定且较低的水平;新聚类在不增加时间复杂度的情况下进行自适应聚类,可以解决汽车雷达数据密度不均匀的情况。可见新算法同时实现了增量和自适应DBSCAN聚类,同时保证聚类的效率和准确度。  相似文献   

19.
在数据挖掘领域,聚类是对数据初始的处理。动态系统中,由于经常要增加一些新的数据,如果每次对新增的数据都重新聚类,这样就既浪费时间又浪费资源。首先介绍了聚类的基本概念和聚类的分类,在此基础上提出的一种基于特征向量的聚类算法,它只对新增的数据聚类,这样就会节省大量的资源和时间。通过实验,在动态系统中对新增的数据用该增量聚类算法和重新聚类的算法相比较,最后得出结论,该增量聚类算法是可行的。  相似文献   

20.
为有效地利用日志文件,更有深度地刻画学习者画像,提出了双路聚类建模方法(Two-way Clustering, TWC),分析挖掘了万余人次学习者在某大学网络教育学院的大量学习行为数据,力图更深刻地展现远程教育学习者的风貌. 考虑到教育数据具有隐含性这一特点,该方法以细粒度数据为核心,通过双角度的聚类计算得到了各学习者在不同模型中的类别,最后基于融合后的模型对学习者进行刻画. 4种经典聚类算法与TWC算法的对比实验结果和TWC算法的聚类结果表明:TWC算法能够增强簇的内聚性,更准确地对学习者进行聚类,从而更深刻、更全面地刻画学习者轮廓.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号