共查询到18条相似文献,搜索用时 46 毫秒
1.
FENG Zhonghui SHEN Junyi BAO Junpeng 《武汉大学学报:自然科学英文版》2006,11(5):1340-1344
0 IntroductionText clusteringis the process of grouping the documentsinto the classes or clusters so that documents within acluster have high si milarityin comparisonto one another ,butare very dissi milar to documents in other clusters .In applica-tions ,the document is always represented by vector spacemodel(VSM) in which each document is represented as a vec-tor and each unique termis of one di mension of this vector .Then,documents are clustered bycalculating distance or si mi-larity[1], … 相似文献
2.
NIWei-wei SUNZhi-hui 《武汉大学学报:自然科学英文版》2004,9(5):590-594
Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based onclosed frequent hemsets derived from association rule mining. which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes, Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient 相似文献
3.
XU Junling XU Baowen ZHANG Weifeng CUI Zifeng ZHANG Wei 《武汉大学学报:自然科学英文版》2007,12(5):912-916
Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method. 相似文献
4.
YU Yongqian ZHAO Xiangguo CHEN Hengyue WANG Bin YU Ge WANG Guoren 《武汉大学学报:自然科学英文版》2006,11(5):1069-1075
This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU(Self-Expanded Clustering Algorithm based on Density Units) and SECDUF(Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying "hill-climbing algorithm", SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man's action involves through the whole process. In addition, SECDUF has a high clustering performance. 相似文献
5.
ZHU Jiang SHEN Qingguo TANG Tang LI Yongqiang 《武汉大学学报:自然科学英文版》2006,11(5):1141-1146
This paper describes the theory, implementation, and experimental evaluation of an Aggregation Cache Replacement ( ACR ) algorithm. By considering application background, carefully choosing weight values, using a special formula to calculate the similarity, and clustering ontologies by similarity for getting more embedded deep relations, ACR combines the ontology similarity with the value of object and decides which object is to be replaced. We demonstrate the usefulness of ACR through experiments. (a) It is found that the aggregation tree is created wholly differently according to the application cases. Therefore, clustering can direct the content adaptation more accurately according to the user perception and can satisfy the user with different preferences. (b) After comparing this new method with widely-used algorithm Last-Recently-Used (LRU) and First-in-First-out (FIFO) method, it is found that ACR outperforms the later two in accuracy and usability. (c) It has a better semantic explanation and makes adaptation more personalized and more precise. 相似文献
6.
HE Kun ZHAO Yong 《武汉大学学报:自然科学英文版》2007,12(2):260-266
A new heuristic approach that resembles the evolution of interpersonal relationships in human society is put forward for the problem of scheduling multitasks represented by a directed acyclic graph. The algorithm includes dynamic-group, detachgraph and front-sink components. The priority rules used are new. Relationship number, potentiality, weight and merge degree are defined for cluster's priority, and task potentiality for tasks' priority. Experiments show the algorithm could get good result in short time. The algorithm produces another optimal solution for the classic MJD benchmark. Its average performance is better than five latter-day representative algorithms, especially six benchmarks of the nines. 相似文献
7.
8.
HUGuang-ming HUANGZun-guo HUHua-ping GONGZheng-hu 《武汉大学学报:自然科学英文版》2005,10(1):39-42
In order to solve security problem of clustering algorithm, we proposed a method to enhance the security of the well-known lowest-ID clustering algorithm. This method is based on the idea of the secret sharing and the (k, n) threshold cryptography. Each node, whether clusterhead or ordinary member, holds ?a share of the global certificate, and any k nodes can communicate securely. There is no need for any clusterhead to execute extra functions more than routing. Our scheme needs ,some prior configuration before deployment, and can be used in critical environment with small scale. The security-enhancement for Lowest-ID algorithm can also be applied into other clustering approaches with minor modification. The feasibility of this method was verified by the simulation results. 相似文献
9.
一种增量式文本软聚类算法 总被引:1,自引:0,他引:1
针对传统文本聚类算法时间复杂度较高,而与距离无关的算法又不适用于动态、变化的文本集等问题,提出了一种基于语义序列的增量式文本软聚类算法.该算法考虑了长文本的多主题特性,并利用语义序列相似关系计算相似语义序列集合的覆盖度,同时将每次选择的具有最小熵重叠值的候选类作为一个结果聚类,这样在整个聚类的过程中大大减小了文本向量空间的维数,缩短了计算时间.由于所提算法的语义序列只与文本自身相关,所以它适用于增量式聚类.实验结果表明,算法的聚类精度高于同条件下的其他聚类算法,尤其适合于长文本集的软聚类. 相似文献
10.
文章的目的是在系统地回顾了万维网信息检索、数据挖掘、搜索引擎以及聚类的应用研究现状基础上,总结目前存在的问题,并提出了相应的解决方法。特别希望通过聚类方法自动组织搜索引擎的搜索方案,便于用户发现真正需要的万维网信息。 相似文献
11.
12.
Lizhihao Rao Juan 《科技信息》2007,(35)
文本分类是指在给定分类体系下,根据文本的内容自动确定文本类别的过程。如何快速地整理海量信息,对不同的文本进行有效分类,已成为获取有价值信息的瓶颈。本文用模糊聚类分析的方法对文本进行分类,较好地解决了信息的实时分类问题,在实践中收到了良好的效果。 相似文献
13.
在非结构化数据挖掘结构模型——发现特征子空间模型(DFSSM)——的运行机制下,提出了一种新的Web文本聚类算法——基于DFSSM的Web文本聚类(WTCDFSSM)算法.该算法具有自稳定性,无须外界给出评价函数;能够识别概念空间中最有意义的特征,抗噪声能力强.结合现代远程教育网应用背景实现了WTCDFSSM聚类算法.结果表明:该算法可以对各类远程教育站点上收集的文本资料信息自动进行聚类挖掘;采用网格结构模型,帮助人们进行文本信息导航;从海量文本信息源中快速有效地获取重要的知识. 相似文献
14.
为减少关联规则挖掘中数据库扫描次数,提出了一种基于准频繁项目集的关联规则挖掘算法———SupposedFrequent,同时给出了候选频繁项目集的产生函数———BGen.最后通过实验证明:在给定最好的准频繁项目集的条件下,只需扫描数据库两次就能产生全部的频繁项目集。 相似文献
15.
稀土金属是一个国家重要的战略资源,我国作为稀土资源大国,却由于缺乏核心专利技术制约了稀土资源的深度开发。为了研究稀土核心专利技术的演进过程,解决我国稀土专利布局的问题,本文利用Lingo文本聚类算法对国内外稀土领域专利信息进行了深入的分析,研究和探索了稀土萃取领域专利申请主体的迁移和研究主题的变迁,并通过可视化的专利地图加以展示。本文的研究结果为我国追踪稀土萃取专利研究热点提供一定的借鉴和参考,对于我国企业专利信息应用、技术研发和知识产权规划布局具有重要意义。 相似文献
16.
Web文本聚类是一种典型的无指导机器学习技术,目标是将站点上采集到的Web文本分成若干簇,使同一簇内的文本相似性最大,不同簇间的文本相似性最小.为了对原始粗糙的Web文本数据进行降维处理,在知识属性值的基础上,计算单个属性相对于属性集的重要性量化值,并根据属性重要性量化值对特征向量降维,并采用K-means算法对降维后的数据聚类,实验证明该方法缩短了聚类时间. 相似文献
17.
Web文本聚类是一种典型的无指导机器学习技术,目标是将站点上采集到的Web文本分成若干簇,使同一簇内的文本相似性最大,不同簇间的文本相似性最小.为了对原始粗糙的Web文本数据进行降维处理,在知识属性值的基础上,计算单个属性相对于属性集的重要性量化值,并根据属性重要性量化值对特征向量降维,并采用K-means算法对降维后的数据聚类,实验证明该方法缩短了聚类时间. 相似文献
18.
CHENJian-bin DONGXiang-jun SONGHan-tao 《武汉大学学报:自然科学英文版》2004,9(5):671-675
To construct a high efficient text clustering algorithm, the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed, The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application, The experiment result demonstrated that the muhilevel graph text clustering algorithm is available, 相似文献