期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The refinement algorithm consideration in text clustering scheme based on multilevel graph

CHENJian-bin DONGXiang-jun SONGHan-tao 《武汉大学学报:自然科学英文版》2004,9(5):671-675

To construct a high efficient text clustering algorithm, the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed, The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application, The experiment result demonstrated that the muhilevel graph text clustering algorithm is available, 相似文献

2.

Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section

YU Yongqian ZHAO Xiangguo CHEN Hengyue WANG Bin YU Ge WANG Guoren 《武汉大学学报:自然科学英文版》2006,11(5):1069-1075

This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms： SECDU（Self-Expanded Clustering Algorithm based on Density Units） and SECDUF（Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section）. SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying ＂hill-climbing algorithm＂, SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man＇s action involves through the whole process. In addition, SECDUF has a high clustering performance. 相似文献

3.

CFSBC： Clustering in High-Dimensional Space Based on Closed Frequent Item Set

NIWei-wei SUNZhi-hui 《武汉大学学报:自然科学英文版》2004,9(5):590-594

Clustering in high-dimensional space is an important domain in data mining. It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal. Many clustering algorithms are not applicable to high dimensional space for its sparseness and decline properties. Dimensionality reduction is an effective method to solve this problem. The paper proposes a novel clustering algorithm CFSBC based onclosed frequent hemsets derived from association rule mining. which can get the clustering attributes with high efficiency. The algorithm has several advantages. First, it deals effectively with the problem of dimensionality reduction. Second, it is applicable to different kinds of attributes, Third, it is suitable for very large data sets. Experiment shows that the proposed algorithm is effective and efficient 相似文献

4.

Combined Density-based and Constraint-based Algorithm for Clustering

陈同孝陈荣昌林志强邱永兴《东华大学学报(英文版)》2006,23(6):36-38,61

We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm （CDC）. CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm （GCA）proposed in 2004. Experimental results show that CDC has better performance. 相似文献

5.

SLID： A Secure Lowest-ID Clustering Algorithm

HUGuang-ming HUANGZun-guo HUHua-ping GONGZheng-hu 《武汉大学学报:自然科学英文版》2005,10(1):39-42

In order to solve security problem of clustering algorithm, we proposed a method to enhance the security of the well-known lowest-ID clustering algorithm. This method is based on the idea of the secret sharing and the (k, n) threshold cryptography. Each node, whether clusterhead or ordinary member, holds ?a share of the global certificate, and any k nodes can communicate securely. There is no need for any clusterhead to execute extra functions more than routing. Our scheme needs ,some prior configuration before deployment, and can be used in critical environment with small scale. The security-enhancement for Lowest-ID algorithm can also be applied into other clustering approaches with minor modification. The feasibility of this method was verified by the simulation results. 相似文献

6.

A New Feature Selection Method for Text Clustering

XU Junling XU Baowen ZHANG Weifeng CUI Zifeng ZHANG Wei 《武汉大学学报:自然科学英文版》2007,12(5):912-916

Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin＇s index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the Davies-Bouldin＇s index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method. 相似文献

7.

Fuzzy Clustering with Novel Separable Criterion 总被引：1，自引：0，他引：1

尹中航唐元钢孙富春孙增圻《清华大学学报》2006,11(1):50-53

Introduction Fuzzy clustering plays an important role in pattern rec ognition, image processing, and data analysis. In fuzzy clustering, every point is assigned a membership to represent the degree of belonging to a certain class The fuzzy c-means (FCM) m… 相似文献

8.

A Chinese Web page clustering algorithm based on the suffix tree

YANGJian-wu 《武汉大学学报:自然科学英文版》2004,9(5):817-822

In this paper, an improved algorithm, named STC-I. is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. 相似文献

9.

The Research of an Incremental Conceptive Clustering Algorithm and Its Application in Detecting Money Laundering

CHEN Yunkai LU Zhengding LI Ruixuan LI Yuhua SUN Xiaolin 《武汉大学学报:自然科学英文版》2006,11(5):1076-1080

Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining （DM）. However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree （SCT） to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved. 相似文献

10.

Clustering of Web learners based on rough set

LIUShuai-dong CHENShi-hong 《武汉大学学报:自然科学英文版》2004,9(5):542-546

The demand for individualized teaching from Elearning websites is rapidly increasing due to the huge differences existed among Web learners. A method for clusteringWeb learners based on rough set is proposed. The basic ideaof the method is to reduce the learning auributes prior to clustering, and therefore the clustering of Web learners iscarried out in a relative low-dimensional space. Using thismethod, the E-learning websites can arrange correspondingleaching content for different clusters of learners so that thelearners‘ individual requirements can be more satisfied. 相似文献

11.

Web Fuzzy Clustering and a Case Study

LIUMao-fu HEJing HEYan-xiang HUHui-jun 《武汉大学学报:自然科学英文版》2004,9(4):411-414

We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. 相似文献

12.

Research of Web documents clustering based on dynamic concept

WANGYun-hua CHENShi-hong 《武汉大学学报:自然科学英文版》2004,9(5):547-552

Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at theinstitutional framework and characteristic of Web theme informauon, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. 相似文献

13.

The Effective Clustering Partition Algorithm Based on the Genetic Evolution

廖芹李希雯《东华大学学报(英文版)》2006,23(6):43-46

To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective. 相似文献

14.

FICW： Frequent Itemset Based Text Clustering with Window Constraint

ZHOU Chong LU Yansheng ZOU Lei HU Rong 《武汉大学学报:自然科学英文版》2006,11(5):1345-1351

Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information. There is some important semantic information existed in the positions of words in the sequence. In this paper, a novel method named Frequent Itemset-based Clustering with Window （FICW） was proposed, which makes use of the semantic information for text clustering with a window constraint. The experimental results obtained from tests on three （hypertext） text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency. 相似文献

15.

Fuzzy clustering method for Web user based on pages classification

ZHANLi-qiang LIUDa-xin 《武汉大学学报:自然科学英文版》2004,9(5):553-556

A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site. then computes fuzzy degree of cross page through aggregating on data of Web tog. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits. and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally. it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. 相似文献

16.

Artificial immune kernel clustering network for unsupervised image segmentation

Wenlong Huang Licheng Jiao 《自然科学进展(英文版)》2008,18(4):455-462

相似文献

17.

A distributed clustering algorithm for wireless sensor networks

Fengjun Shang 《武汉大学学报:自然科学英文版》2008,13(4):385-390

In the paper, we consider a network of energy constrained sensors deployed over a region. Each sensor node in such a network is systematically gathering and transmitting sensed data to a base station （via clusterhead） for further processing. The key problem focuses on how to reduce the power consumption of wireless microsensor networks. The core includes the energy efficiency of clusterheads and that of cluster members. We first extend low-energy adaptive clustering hierarchy （LEACH）＇s stochastic clusterhead selection algorithm by a factor with distance-based deterministic component （LEACH-D） to reduce energy consumption for energy efficiency of clusterhead. And the cost function is proposed so that it balances the energy consumption of nodes for energy efficiency of cluster member. Simulation results show that our modified scheme can extend the network life around up to 40% before first node dies. Through both theoretical analysis and numerical results, it is shown that the proposed algorithm achieves better performance than the existing representative methods. 相似文献

18.

Concept association and hierarchical Hamming clustering model in text classification

SuGui-yang LiJian-hua MaYing-hua LiSheng-hong YinZhong-hang 《武汉大学学报:自然科学英文版》2004,9(3):339-342

We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents~ feature space. The results of experiment indicate that it can obtain the co-occurrence relations among keywords in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. 相似文献

19.

A Throughput-Driven Scheduling Algorithm of Differentiated Service for Web Cluster

YAN Cai-rong SHEN Jun-yi PENG Qin-ke WAN Yong-quan 《武汉大学学报:自然科学英文版》2006,11(1):88-92

Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm （TDSA）. The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental re suits from the testing tool of WebBench^TM show better per formance for Web cluster server with TDSA than that with traditional scheduling algorithms. 相似文献

20.

A New Clustering Algorithm for Categorical Attributes

下载免费PDF全文

Songfeng Lu Zhengding Lu 《矿物冶金与材料学报》2000,7(4):318-322

In traditional data clustering, similarity of a cluster of objects is measured by distance between objects. Such measures are not appropriate for categorical data. A new clustering criterion to determine the similarity between points with categorical attributes is presented. Furthermore, a new clustering algorithm for categorical attributes is addressed. A single scan of the dataset yields a good clustering, and more additional passes can be used to improve the quality further. 相似文献