首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 53 毫秒
1.
基于半监督K-means的K值全局寻优算法   总被引:3,自引:0,他引:3  
提出一种基于半监督K-means的K值全局寻优算法,该算法打破传统方法中采用样本类别作为K值的限定,利用少量标记数据即可指导和规划大量无监督数据.结合数据集自身的分布特点及聚类后各个簇内的监督信息,根据投票方法来指导簇中数据集的类别标记.实验表明,本文所提出的方法可以有效的寻找适合数据集的最佳K值和聚类的中心,提高聚类性能.  相似文献   

2.
k-means聚类算法的MapReduce并行化实现   总被引:1,自引:0,他引:1  
针对k-means聚类算法特点,给出了MapReduce编程模型实现k-means聚类算法的方法,Map函数完成每个记录到聚类中心距离的计算并重新标记其属于的新聚类类别,Reduce函数根据Map函数得到的中间结果计算出新的聚类中心,供下一轮MapReduce Job使用.实验结果表明:k-means算法MapReduce并行化后部署在Hadoop集群上运行,具有较好的加速比和良好的扩展性.  相似文献   

3.
为了能在多文档自动摘要过程中更好地划分子主题,提出了一种基于半监督学习的子主题划分方法:首先计算句子的语义相似度;然后通过层次聚类对可信度高的句子进行主题类别标记,生成少量已标记主题类别的句子集,在此基础上对所有句子进行constrained-k-means聚类,通过交叉验证的方法确定子主题的数目k;最后使用k-means聚类获得多文档的各个子主题.实验结果表明,该方法有效地提高了子主题的识别率.  相似文献   

4.
针对传统k-means聚类算法面对海量数据存在时间复杂度急剧增加的问题,结合云计算的优势,提出基于MapReduce编程框架来实现k-means聚类算法的并行化处理。Map函数完成每个样本记录到聚类中心的距离计算并标记其所属聚类类别,Reduce函数汇总中间结果并计算出新的聚类中心,供下一轮迭代使用。通过实验表明:基于MapReduce的并行化k-means聚类算法具有较好的加速比和良好的扩展性。  相似文献   

5.
针对半监督聚类算法中监督信息使用不充分,监督信息中信息含有量低的问题,提出一种结合主动学习的半监督聚类算法.首先结合使用数据的类别标记和成对约束信息,指导Kmeans聚类过程,设计出一种基于Seeds集和成对约束的半监督聚类算法SC-Kmeans;其次将主动学习算法引入到SC-Kmeans中,以尽量小的代价选取信息含有量更高的监督信息,提高SC-Kmeans算法的聚类精度;最后在UCI标准数据集上进行仿真实验.实验结果表明,该算法取得了较好的聚类效果,有效提高了聚类准确率.  相似文献   

6.
提出了一种拓展的半监督模糊聚类模型,给出求解这个模型的迭代公式.这种半监督聚类能够合理、有效地利用部分已标识样本的类别信息对未标识样本产生影响,从而提高半聚类算法的聚类效果.其隶属度和聚类中心的迭代公式具有和FCM算法一样简洁的表示.在黄瓜数据集上的聚类分析表明,新提出的半监督聚类优于未改进的两种半监督算法、FCM算法和线性判别方法.  相似文献   

7.
采用以平面为原型来拟合样本的思想设计学习机,已在机器学习和数据挖掘等领域引起广泛关注,然而,如何利用少量标记样本,兼顾平面原型特点实现聚类,鲜见报道.以kPC(k-Plane Clustering)为切入点,在有标样本极端少的情况下,设计了半监督型平面聚类算法semi-kPC.考虑到L1范数较L2范数更为鲁棒的事实,在已有工作L1kPC(L1 norm kPC)的基础上,提出基于L1范数的半监督聚类方法 semi-L1kPC.从每类仅有一个已标样本出发,在人工数据集和UCI数据集上的实验表明:(1)在XOR(Exclusive OR)问题上,平面型的聚类方法的聚类准确率均显著高于k-means算法,因为k-means无法利用平面特性;(2)在引入少量监督信息后,半监督型聚类方法 semi-kPC和semi-L1kPC比其他聚类方法的聚类准确率更高;(3)采用L1范数的semi-L1kPC比semi-kPC的鲁棒性更好.  相似文献   

8.
待挖掘数据集规模的不断增长,以往的聚类算法由于需要多次扫描原始数据集而不再适用,现阶段,一遍扫描原始数据集即完成聚类的算法成为了首要的研究目标.但是,现有针对大规模数据集的算法容易受到初始化参数以及原始数据集分布的影响,聚类结果质量不高,并且也不稳定.对此,吸收半监督聚类的思想,提出了基于标记集的半监督一遍扫描K均值算法,该算法利用驻留主存的标记集指导聚类过程,使得聚类效率以及聚类结果的质量得到了进一步的提高.在人工生成数据集以及1998KDD数据集上验证了该算法的有效性.  相似文献   

9.
针对数据实际分布与假设不匹配时半监督学习算法难以改善分类器性能的问题,该文提出一种最大化样本可分性半监督Boosting算法,通过引入"高密度区域局部散度最小、样本空间全局散度最大"准则来学习未标注的样本。该准则使用两种半监督假设(聚类假设和流形假设),减少了因半监督假设与数据不匹配造成的准确率下降问题。实验结果表明,该文算法有效提高了Boosting算法在符合聚类假设数据集和符合流形假设数据集上的准确性,提高了分类器噪声数据的稳定性。  相似文献   

10.
分析了k-means算法的缺陷、入侵检测特点和网络中数据的特点,提出了一种基于密度的无监督2次聚类算法—KD算法。该算法聚类使用改进的k-means算法并引入基于密度聚类算法的优点,以提高对单种入侵数据集及混合入侵数据集的检测效果。实验结果表明,该算法具有较高的检测率和较低的误检率。  相似文献   

11.
There are numerous geometric objects stored in the spatial databases. An importance function in a spatial database is that users can browse the geometric objects as a map efficiently. Thus the spatial database should display the geometric objects users concern about swiftly onto the display window. This process includes two operations:retrieve data from database and then draw them onto screen. Accordingly, to improve the efficiency, we should try to reduce time of both retrieving object and displaying them. The former can be achieved with the aid of spatial index such as R-tree, the latter require to simplify the objects. Simplification means that objects are shown with sufficient but not with unnecessary detail which depend on the scale of browse. So the major problem is how to retrieve data at different detail level efficiently. This paper introduces the implementation of a multi-scale index in the spatial database SISP (Spatial Information Shared Platform) which is generalized from R-tree. The difference between the generalization and the R-tree lies on two facets: One is that every node and geometric object in the generalization is assigned with a importance value which denote the importance of them, and every vertex in the objects are assigned with a importance value,too. The importance value can be use to decide which data should be retrieve from disk in a query. The other difference is that geometric objects in the generalization are divided into one or more sub-blocks, and vertexes are total ordered by their importance value. With the help of the generalized R-tree, one can easily retrieve data at different detail levels.Some experiments are performed on real-life data to evaluate the performance of solutions that separately use normal spatial index and multi-scale spatial index. The results show that the solution using multi-scale index in SISP is satisfying.  相似文献   

12.
13.
Future mobile communication systems aim at providing very high data transmission rates, even in high-mobility scenarios such as high-speed wheel-track trains, maglev trains, highway vehicles, airplanes, guided missiles or spacecraft. A particularly important commercial application is the strong and increasing worldwide demand for high- speed broadband wireless communications (up to 574.8 km/ h test speeds or 380 km/h commercial speeds) in railways, providing data, voice and video services for applications such as onboard entertainment services to passengers, train control, train dispatch, train sensor status handling and sur- veillance. In such high-mobility scenarios, there are a number of communication challenges, including fast hand- over, location updating, high-speed channel modeling, estimation and equalization, anti-Doppler spreading tech- niques, fast power control, and dedicated network architec- ture. Because signal transmission in very high-speed scenarios will inevitably experience serious deterioration, it is imperative to develop key broadband mobile communi- cation techniques for such high-speed vehicles.  相似文献   

14.
Instead of following Fock’s expansion,we solve the Schrodinger equation for some quantum mechanical manybody systems such as electrons in atoms and charged excitons in quantum wells in a similar way in hyperspherical coordinates by expanding the wave functions into orthonormal complete basis sets of the hyperspherical hannonics(HHs)of hyperangles and generalized Laguerre polynomials(GLPs)of the hyperradius.This leads the equation to  相似文献   

15.
Being the primary media of geographical information and the elementary objects manipulated, almost all of maps adopt the layer-based model to represent geographic information in the existent GIS. However, it is difficult to extend the map represented in layer-based model. Furthermore, in Web-Based GIS, It is slow to transmit the spatial data for map viewing. In this paper, for solving the questions above, we have proposed a new method for representing the spatial data. That is scale-based model. In this model we represent maps in three levels: scale-view, block, and spatial object, and organize the maps in a set of map layers, named Scale-View, which associates some given scales.Lastly, a prototype Web-Based GIS using the proposed spatial data representation is described briefly.  相似文献   

16.
为了有助于提高英文爱好者的个人情操、文化素养和学习兴趣,以及专业人员对英语学习和研究进行多途径的探索,本文通过一些精选诗歌的引证和分析,着重论述了喻类修辞法在英文诗歌中的运用其及效果。  相似文献   

17.
Tennessee Williams is considered as one of the most important American playwrights since World War II.The Glass Menagerie is his first successful drama,which describes a tragic situation of family and means to say that Man is unable to change the miserable life,no matter whatever means he try,This essay focuses on the analysis of the arrangement of the four main characters:Laura.Amanda,Jim and Tom to reveal the theme.Laura is fragile.Amanda is brave.Jim is vital.Tom is sensible.And all of them develop and try the different means to struggle against life,but fail tragically.With the evidence,the paper comes to conclusion naturally that Man is unable to change the miserable life,and he dooms to fail.  相似文献   

18.
The aim of this study is to investigate the diversity of Retama raetam root-nodule bacteria isolated from arid regions of Tunisia. Twelve isolates, chosen as representative for different 16S rRNA gene patterns, were characterized by 16S rRNA gene sequencing and phenotypic analysis. Isolates were assigned to Sinorhizobium, Rhizobium and Agrobacterium. Symbiotic properties of Sinorhizobium and Rhizobium isolates showed a large diversity in their capacity to infect their host plant and fix atmospheric nitrogen. Strain RK 22 identified as Rhizobium was the most effective isolate.  相似文献   

19.
正Recently,docking has been widely used to predict the binding-modes of protein-inhibitors,when the crystal complexes structure was absent.Most docking algorithms are able to generate a large number of probable conformations,it,however,is difficult to effectively evaluate these docking poses and identify the most reasonable bindingmode.In the present study,on the basis of the crystallographic data of human 3-hydroxy-3-methylglutaryl coenzyme  相似文献   

20.
介绍了WiMAX与Wi-Fi两种无线宽带接入技术,并对两者之间的关系及相互之间的影响做了对比及分析,并对WiMAX的关键技术进行了详细说明,最后对两者的联合组网方式做了简单的探讨。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号