首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
Maximum sum-of-splits clustering   总被引:1,自引:1,他引:0  
ConsiderN entities to be classified, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity of this cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We study here the average split of the clusters or, equivalently, the sum of splits. A (N 2) algorithm is provided to determine maximum sum-of-splits partitions intoM clusters for allM betweenN – 1 and 2, using the dual graph of the single-linkage dendrogram.
Résumé SoientN objets à classifier et une matrice de dissimilarit és entre paires de ces objets. L'écart d'une classe est la plus petite dissimilarité entre un objet de cette classe et un objet en dehors d'elle. L'algorithme du lien simple fournit des partitions enM classes dont le plus petit écart est maximum. On étudie l'écart moyen des classes, ou, ce qui est équivalent, la somme des écarts. On propose un algorithme en (N 2) pour déterminer des partitions enM classes dont la somme des écarts est maximum pourM allant deN – 1 à 2, basé sur le graphe dual du dendrogramme de la méthode du lien simple.
  相似文献   

2.
Consider N entities to be classified (e.g., geographical areas), a matrix of dissimilarities between pairs of entities, a graph H with vertices associated with these entities such that the edges join the vertices corresponding to contiguous entities. The split of a cluster is the smallest dissimilarity between an entity of this cluster and an entity outside of it. The single-linkage algorithm (ignoring contiguity between entities) provides partitions into M clusters for which the smallest split of the clusters, called split of the partition, is maximum. We study here the partitioning of the set of entities into M connected clusters for all M between N - 1 and 2 (i.e., clusters such that the subgraphs of H induced by their corresponding sets of entities are connected) with maximum split subject to that condition. We first provide an exact algorithm with a (N2) complexity for the particular case in which H is a tree. This algorithm suggests in turn a first heuristic algorithm for the general problem. Several variants of this heuristic are Also explored. We then present an exact algorithm for the general case based on iterative determination of cocycles of subtrees and on the solution of auxiliary set covering problems. As solution of the latter problems is time-consuming for large instances, we provide another heuristic in which the auxiliary set covering problems are solved approximately. Computational results obtained with the exact and heuristic algorithms are presented on test problems from the literature.  相似文献   

3.
Dendrograms are widely used to represent graphically the clusters and partitions obtained with hierarchical clustering schemes. Espaliers are generalized dendrograms in which the length of horizontal lines is used in addition to their level in order to display the values of two characteristics of each cluster (e.g., the split and the diameter) instead of only one. An algorithm is first presented to transform a dendrogram into an espalier without rotation of any part of the former. This is done by stretching some of the horizontal lines to obtain a diagram with vertical and horizontal lines only, the cutting off by diagonal lines the parts of the horizontal lines exceeding their prescribed length. The problem of finding if, allowing rotations, no diagonal lines are needed is solved by anO(N 2) algorithm whereN is the number of entities to be classified. This algorithm is the generalized to obtain espaliers with minimum width and, possibly, some diagonal lines.Work of the first and second authors has been supported by FCAR (Fonds pour la Formation de Chercheurs et l'Aide à la Recherche) grant 92EQ1048, and grant N00014-92-J-1194 from the Office of Naval Research. Work of the first author has also been supported by NSERC (Natural Sciences and Engineering Research Council of Canada) grant to École des Hautes Études Commerciales, Montréal and by NSERC grant GP0105574. Work of the second author has been supported by NSERC grant GP0036426, by FCAR grant 90NC0305, and by an NSF Professorship for Women in Science at Princeton University from September 1990 until December 1991. Work of the third author was done in part during a visit to GERAD, Montréal.  相似文献   

4.
Minimum sum of diameters clustering   总被引:1,自引:1,他引:0  
The problem of determining a partition of a given set ofN entities intoM clusters such that the sum of the diameters of these clusters is minimum has been studied by Brucker (1978). He proved that it is NP-complete forM3 and mentioned that its complexity was unknown forM=2. We provide anO(N 3 logN) algorithm for this latter case. Moreover, we show that determining a partition into two clusters which minimizes any given function of the diameters can be done inO(N 5) time.Acknowledgments: This research was supported by the Air Force Office of Scientific Research Grant AFOSR 0271 to Rutgers University. We are grateful to Yves Crama for several insightful remarks and to an anonymous referee for detailed comments.  相似文献   

5.
Divisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is smallest possible. We provide two such algorithms with complexitiesO( N 2) andO(N 2logN) respectively, where denotes the maximum number of clusters in a partition andN the number of entities to be clustered. The former algorithm, an efficient implementation of an algorithm of Hubert, allows to find all partitions into at most clusters and is inO(N 2) for fixed . Moreover, if in each partitioning the size of the largest cluster is bounded byp times the number of entities in the set to be partitioned, with 1/2<=p<1, it provides a complete hierarchy of partitionsO(N 2 logN) time. The latter algorithm, a refinement of an algorithm of Rao allows to build a complete hierarchy of partitions inO(N 2 logN) time without any restriction. Comparative computational experiments with both algorithms and with an agglomerative hierarchical algorithm of Benzécri are reported.
Résumé Les algorithmes de classification hiérarchique descendante utilisant le critère du diamètre, sélectionnent récursivement la classe de plus grand diamètre et la partitionnent en deux classes, dont le plus grand diamètre est le plus, petit possible. Nous proposons deux tels algorithmes, avec des complexités enO ( N2) etO(N 2 logN) respectivement, où désigne le nombre maximum de classes d'une partition etN le nombre d'objets à classifier. Le premier algorithme, une implantation d'un algorithme de Hubert, permet de construire des partitions avec au plus classes et est enO(N 2) pour fixé. De plus, si dans chaque bipartition le nombre d'objets de la plus grande classe, est borné parp fois le nombre d'objets de l'ensemble à partitionner, où 1/2≤p<1, cet algorithme permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN). Le second algorithme, un raffinement d'un algorithme de Rao, permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN) sans aucune restriction On présente également des résultats de calcul comparatifs pour les deux algorithmes et pour l'algorithme de classification hiérarchique ascendante de Benzécri.
  相似文献   

6.
A random sample of sizeN is divided intok clusters that minimize the within clusters sum of squares locally. Some large sample properties of this k-means clustering method (ask approaches withN) are obtained. In one dimension, it is established that the sample k-means clusters are such that the within-cluster sums of squares are asymptotically equal, and that the sizes of the cluster intervals are inversely proportional to the one-third power of the underlying density at the midpoints of the intervals. The difficulty involved in generalizing the results to the multivariate case is mentioned.This research was supported in part by the National Science Foundation under Grant MCS75-08374. The author would like to thank John Hartigan and David Pollard for helpful discussions and comments.  相似文献   

7.
In this paper, we propose a bicriterion objective function for clustering a given set ofN entities, which minimizes [d–(1–)s], where 01, andd ands are the diameter and the split of the clustering, respectively. When =1, the problem reduces to minimum diameter clustering, and when =0, maximum split clustering. We show that this objective provides an effective way to compromise between the two often conflicting criteria. While the problem is NP-hard in general, a polynomial algorithm with the worst-case time complexityO(N 2) is devised to solve the bipartition version. This algorithm actually gives all the Pareto optimal bipartitions with respect to diameter and split, and it can be extended to yield an efficient divisive hierarchical scheme. An extension of the approach to the objective [(d 1+d 2)–2(1–)s] is also proposed, whered 1 andd 2 are diameters of the two clusters of a bipartition.This research was supported in part by the National Science and Engineering Research Council of Canada (Grant OGP 0104900). The authors wish to thank two anonymous referees, whose detailed comments on earlier drafts improved the paper.  相似文献   

8.
9.
A Binary Integer Program to Maximize the Agreement Between Partitions   总被引:1,自引:1,他引:0  
This research note focuses on a problem where the cluster sizes for two partitions of the same object set are assumed known; however, the actual assignments of objects to clusters are unknown for one or both partitions. The objective is to find a contingency table that produces maximum possible agreement between the two partitions, subject to constraints that the row and column marginal frequencies for the table correspond exactly to the cluster sizes for the partitions. This problem was described by H. Messatfa (Journal of Classification, 1992, pp. 5–15), who provided a heuristic procedure based on the linear transportation problem. We present an exact solution procedure using binary integer programming. We demonstrate that our proposed method efficiently obtains optimal solutions for problems of practical size. We would like to thank the Editor, Willem Heiser, and an anonymous reviewer for helpful comments that resulted in improvements of this article.  相似文献   

10.
An error variance approach to two-mode hierarchical clustering   总被引:2,自引:2,他引:0  
A new agglomerative method is proposed for the simultaneous hierarchical clustering of row and column elements of a two-mode data matrix. The procedure yields a nested sequence of partitions of the union of two sets of entities (modes). A two-mode cluster is defined as the union of subsets of the respective modes. At each step of the agglomerative process, the algorithm merges those clusters whose fusion results in the smallest possible increase in an internal heterogeneity measure. This measure takes into account both the variance within the respective cluster and its centroid effect defined as the squared deviation of its mean from the maximum entry in the input matrix. The procedure optionally yields an overlapping cluster solution by assigning further row and/or column elements to clusters existing at a preselected hierarchical level. Applications to real data sets drawn from consumer research concerning brand-switching behavior and from personality research concerning the interaction of behaviors and situations demonstrate the efficacy of the method at revealing the underlying two-mode similarity structure.  相似文献   

11.
The Metric Cutpoint Partition Problem   总被引:1,自引:1,他引:0  
Let G = (V, E,w) be a graph with vertex and edge sets V and E, respectively, and w: E → a function which assigns a positive weight or length to each edge of G. G is called a realization of a finite metric space (M, d), with M = {1, ..., n} if and only if {1, ..., n} ⊆ V and d(i, j) is equal to the length of the shortest chain linking i and j in Gi, j = 1, ..., n. A realization G of (M, d), is called optimal if the sum of its weights is minimal among all the realizations of (M, d). A cutpoint in a graph G is a vertex whose removal strictly increases the number of connected components of G. The Metric Cutpoint Partition Problem is to determine if a finite metric space (M, d) has an optimal realization containing a cutpoint. We prove in this paper that this problem is polynomially solvable. We also describe an algorithm that constructs an optimal realization of (M, d) from optimal realizations of subspaces that do not contain any cutpoint. Supported by grant PA002-104974/2 from the Swiss National Science Foundation. Published online xx, xx, xxxx.  相似文献   

12.
T clusters, based on J distinct, contributory partitions (or, equivalently, J polytomous attributes). We describe a new model/algorithm for implementing this objective. The method's objective function incorporates a modified Rand measure, both in initial cluster selection and in subsequent refinement of the starting partition. The method is applied to both synthetic and real data. The performance of the proposed model is compared to latent class analysis of the same data set.  相似文献   

13.
k consisting of k clusters, with k > 2. Bottom-up agglomerative approaches are also commonly used to construct partitions, and we discuss these in terms of worst-case performance for metric data sets. Our main contribution derives from a new restricted partition formulation that requires each cluster to be an interval of a given ordering of the objects being clustered. Dynamic programming can optimally split such an ordering into a partition Pk for a large class of objectives that includes min-diameter. We explore a variety of ordering heuristics and show that our algorithm, when combined with an appropriate ordering heuristic, outperforms traditional algorithms on both random and non-random data sets.  相似文献   

14.
The median procedure for n-trees   总被引:2,自引:2,他引:0  
Let (X,d) be a metric space The functionM:X k 2 x defined by is the minimum } is called themedian procedure and has been found useful in various applications involving the notion of consensus Here we present axioms that characterizeM whenX is a certain class of trees (hierarchical classifications), andd is the symmetric difference metricWe would like to thank the referees and Editor for helpful comments  相似文献   

15.
16.
Single linkage clusters on a set of points are the maximal connected sets in a graph constructed by connecting all points closer than a given threshold distance. The complete set of single linkage clusters is obtained from all the graphs constructed using different threshold distances. The set of clusters forms a hierarchical tree, in which each non-singleton cluster divides into two or more subclusters; the runt size for each single linkage cluster is the number of points in its smallest subcluster. The maximum runt size over all single linkage clusters is our proposed test statistic for assessing multimodality. We give significance levels of the test for two null hypotheses, and consider its power against some bimodal alternatives. Research partially supported by NSF Grant No. DMS-8617919.  相似文献   

17.
Let \( \mathcal{G} \) = (G,w) be a weighted simple finite connected graph, that is, let G be a simple finite connected graph endowed with a function w from the set of the edges of G to the set of real numbers. For any subgraph G′ of G, we define w(G′) to be the sum of the weights of the edges of G′. For any i, j vertices of G, we define D {i,j}(\( \mathcal{G} \)) to be the minimum of the weights of the simple paths of G joining i and j. The D {i,j}(\( \mathcal{G} \)) are called 2-weights of \( \mathcal{G} \). Weighted graphs and their reconstruction from 2-weights have applications in several disciplines, such as biology and psychology.Let \( {\left\{{m}_I\right\}}_{I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right)} \) and \( {\left\{{M}_I\right\}}_{I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right)} \) be two families of positive real numbers parametrized by the 2-subsets of {1, …, n} with m I M I for any I; we study when there exist a positive-weighted graph G and an n-subset {1, …, n} of the set of its vertices such that D I (\( \mathcal{G} \)) ∈ [m I ,M I ] for any \( I\in \left(\frac{\left\{1,\dots, n\right\}}{2}\right) \). Then we study the analogous problem for trees, both in the case of positive weights and in the case of general weights.  相似文献   

18.
Aggregation of equivalence relations   总被引:1,自引:1,他引:0  
Each ofn attributes partitions a set of items into equivalence classes. Aconsistent aggregator of then partitions is defined as an aggregate partition that satisfies an independence condition and a unanimity condition. It is shown that the class of consistent aggregators is precisely the class ofconjunctive aggregators. That is, for each consistent aggregator there is a nonempty subsetN of the attributes such that two items are equivalent in the aggregate partition if and only if they are equivalent with respect to each attribute inN.  相似文献   

19.
K -means partitioning. We also describe some new features and improvements to the algorithm proposed by De Soete. Monte Carlo simulations have been conducted using different error conditions. In all cases (i.e., ultrametric or additive trees, or K-means partitioning), the simulation results indicate that the optimal weighting procedure should be used for analyzing data containing noisy variables that do not contribute relevant information to the classification structure. However, if the data involve error-perturbed variables that are relevant to the classification or outliers, it seems better to cluster or partition the entities by using variables with equal weights. A new computer program, OVW, which is available to researchers as freeware, implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a new algorithm for optimal variable weighting for K-means partitioning.  相似文献   

20.
In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance andWilliams’ formula which enables the implementation of the algorithm in a recursive way. The authors thank A. Arenas for discussion and helpful comments. This work was partially supported by DGES of the Spanish Government Project No. FIS2006–13321–C02–02 and by a grant of Universitat Rovira i Virgili.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号