期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The weighted sum of split and diameter clustering

Y. Wang H. Yan C. Sriskandarajah 《Journal of Classification》1996,13(2):231-248

In this paper, we propose a bicriterion objective function for clustering a given set ofN entities, which minimizes [d–(1–)s], where 01, andd ands are the diameter and the split of the clustering, respectively. When =1, the problem reduces to minimum diameter clustering, and when =0, maximum split clustering. We show that this objective provides an effective way to compromise between the two often conflicting criteria. While the problem is NP-hard in general, a polynomial algorithm with the worst-case time complexityO(N ²) is devised to solve the bipartition version. This algorithm actually gives all the Pareto optimal bipartitions with respect to diameter and split, and it can be extended to yield an efficient divisive hierarchical scheme. An extension of the approach to the objective [(d ₁+d ₂)–2(1–)s] is also proposed, whered ₁ andd ₂ are diameters of the two clusters of a bipartition.This research was supported in part by the National Science and Engineering Research Council of Canada (Grant OGP 0104900). The authors wish to thank two anonymous referees, whose detailed comments on earlier drafts improved the paper. 相似文献

2.

Efficient algorithms for divisive hierarchical clustering with the diameter criterion

A. Guénoche P. Hansen B. Jaumard 《Journal of Classification》1991,8(1):5-30

Divisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is smallest possible. We provide two such algorithms with complexitiesO( N ²) andO(N ²logN) respectively, where denotes the maximum number of clusters in a partition andN the number of entities to be clustered. The former algorithm, an efficient implementation of an algorithm of Hubert, allows to find all partitions into at most clusters and is inO(N ²) for fixed . Moreover, if in each partitioning the size of the largest cluster is bounded byp times the number of entities in the set to be partitioned, with 1/2<=p<1, it provides a complete hierarchy of partitionsO(N ² logN) time. The latter algorithm, a refinement of an algorithm of Rao allows to build a complete hierarchy of partitions inO(N ² logN) time without any restriction. Comparative computational experiments with both algorithms and with an agglomerative hierarchical algorithm of Benzécri are reported.

Résumé Les algorithmes de classification hiérarchique descendante utilisant le critère du diamètre, sélectionnent récursivement la classe de plus grand diamètre et la partitionnent en deux classes, dont le plus grand diamètre est le plus, petit possible. Nous proposons deux tels algorithmes, avec des complexités enO ( N²) etO(N ² logN) respectivement, où désigne le nombre maximum de classes d'une partition etN le nombre d'objets à classifier. Le premier algorithme, une implantation d'un algorithme de Hubert, permet de construire des partitions avec au plus classes et est enO(N ²) pour fixé. De plus, si dans chaque bipartition le nombre d'objets de la plus grande classe, est borné parp fois le nombre d'objets de l'ensemble à partitionner, où 1/2≤p<1, cet algorithme permet de construire une hiérarchie complète de partitions en tempsO(N ² logN). Le second algorithme, un raffinement d'un algorithme de Rao, permet de construire une hiérarchie complète de partitions en tempsO(N ² logN) sans aucune restriction On présente également des résultats de calcul comparatifs pour les deux algorithmes et pour l'algorithme de classification hiérarchique ascendante de Benzécri.

相似文献

3.

Maximum sum-of-splits clustering

P. Hansen B. Jaumard O. Frank 《Journal of Classification》1989,6(1):177-193

ConsiderN entities to be classified, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity of this cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We study here the average split of the clusters or, equivalently, the sum of splits. A (N ²) algorithm is provided to determine maximum sum-of-splits partitions intoM clusters for allM betweenN – 1 and 2, using the dual graph of the single-linkage dendrogram.

Résumé SoientN objets à classifier et une matrice de dissimilarit és entre paires de ces objets. L'écart d'une classe est la plus petite dissimilarité entre un objet de cette classe et un objet en dehors d'elle. L'algorithme du lien simple fournit des partitions enM classes dont le plus petit écart est maximum. On étudie l'écart moyen des classes, ou, ce qui est équivalent, la somme des écarts. On propose un algorithme en (N ²) pour déterminer des partitions enM classes dont la somme des écarts est maximum pourM allant deN – 1 à 2, basé sur le graphe dual du dendrogramme de la méthode du lien simple.

相似文献

4.

Weight constrained maximum split clustering

P. Hansen B. Jaumard K. Musitu 《Journal of Classification》1990,7(2):217-240

ConsiderN entities to be classified, with given weights, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity in that cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We consider the problems of finding maximum split partitions with exactlyM clusters and with at mostM clusters subject to the additional constraint that the sum of the weights of the entities in each cluster never exceeds a given bound. These two problems are shown to be NP-hard and reducible to a sequence of bin-packing problems. A (N ²) algorithm for the particular caseM =N of the second problem is also presented. Computational experience is reported.Acknowledgments: Work of the first author was supported in part by AFOSR grants 0271 and 0066 to Rutgers University and was done in part during a visit to GERAD, Ecole Polytechnique de Montréal, whose support is gratefully acknowledged. Work of the second and third authors was supported by NSERC grant GP0036426 and by FCAR grant 89EQ4144. We are grateful to Silvano Martello and Paolo Toth for making available to us their program MTP for the bin-paking problem and to three anonymous referees for comments which helped to improve the presentation of the paper. 相似文献

5.

Direct multicriteria clustering algorithms 总被引：1，自引：0，他引：1

A. Ferligoj V. Batagelj 《Journal of Classification》1992,9(1):43-61

In a multicriteria clustering problem, optimization over more than one criterion is required. The problem can be treated in different ways: by reduction to a clustering problem with the single criterion obtained as a combination of the given criteria; by constrained clustering algorithms where a selected critetion is considered as the clustering criterion and all others determine the constraints; or by direct algorithms. In this paper two types of direct algorithms for solving multicriteria clustering problem are proposed: the modified relocation algorithm, and the modified agglomerative algorithm. Different elaborations of these two types of algorithms are discussed and compared. Finally, two applications of the proposed algorithms are presented. Elaborated version of the talks presented at the First Conference of the International Federation of Classification Societies, Aachen, 1987, at the International Conference on Social Science Methodology, Dubrovnik, 1988, and at the Second Conference of the International Federation of Classification Societies, Charlottesville, 1989. This work was supported in part by the Research Council of Slovenia. 相似文献

6.

Additive two-mode clustering: The error-variance approach revisited

Boris Mirkin Phipps Arabie Lawrence J. Hubert 《Journal of Classification》1995,12(2):243-263

The additive clustering approach is applied to the problem of two-mode clustering and compared with the recent error-variance approach of Eckes and Orlik (1993). Although the schemes of the computational algorithms look very similar in both of the approaches, the additive clustering has been shown to have several advantages. Specifically, two technical limitations of the error-variance approach (see Eckes and Orlik 1993, p. 71) have been overcome in the framework of the additive clustering. The research was supported by the Office of Naval Research under grant number N0014-93-1-0222 to Rutgers University. The authors are indebted both to Fionn Murtagh, who served as Acting Editor, and to anonymous Referees for thoughtful and constructive reviews. 相似文献

7.

Espaliers: A generalization of dendrograms

Pierre Hansen Brigitte Jaumard Bruno Simeone 《Journal of Classification》1996,13(1):107-127

Dendrograms are widely used to represent graphically the clusters and partitions obtained with hierarchical clustering schemes. Espaliers are generalized dendrograms in which the length of horizontal lines is used in addition to their level in order to display the values of two characteristics of each cluster (e.g., the split and the diameter) instead of only one. An algorithm is first presented to transform a dendrogram into an espalier without rotation of any part of the former. This is done by stretching some of the horizontal lines to obtain a diagram with vertical and horizontal lines only, the cutting off by diagonal lines the parts of the horizontal lines exceeding their prescribed length. The problem of finding if, allowing rotations, no diagonal lines are needed is solved by anO(N ²) algorithm whereN is the number of entities to be classified. This algorithm is the generalized to obtain espaliers with minimum width and, possibly, some diagonal lines.Work of the first and second authors has been supported by FCAR (Fonds pour la Formation de Chercheurs et l'Aide à la Recherche) grant 92EQ1048, and grant N00014-92-J-1194 from the Office of Naval Research. Work of the first author has also been supported by NSERC (Natural Sciences and Engineering Research Council of Canada) grant to École des Hautes Études Commerciales, Montréal and by NSERC grant GP0105574. Work of the second author has been supported by NSERC grant GP0036426, by FCAR grant 90NC0305, and by an NSF Professorship for Women in Science at Princeton University from September 1990 until December 1991. Work of the third author was done in part during a visit to GERAD, Montréal. 相似文献

8.

Constrained clustering and Kohonen Self-Organizing Maps

Christophe Ambroise Gérard Govaert 《Journal of Classification》1996,13(2):299-313

The Self-Organizing Feature Maps (SOFM; Kohonen 1984) algorithm is a well-known example of unsupervised learning in connectionism and is a clustering method closely related to the k-means. Generally the data set is available before running the algorithm and the clustering problem can be approached by an inertia criterion optimization. In this paper we consider the probabilistic approach to this problem. We propose a new algorithm based on the Expectation Maximization principle (EM; Dempster, Laird, and Rubin 1977). The new method can be viewed as a Kohonen type of EM and gives a better insight into the SOFM according to constrained clustering. We perform numerical experiments and compare our results with the standard Kohonen approach. 相似文献

9.

Investigation of proportional link linkage clustering methods

William H. E. Day Herbert Edelsbrunner 《Journal of Classification》1985,2(1):239-254

Proportional link linkage (PLL) clustering methods are a parametric family of monotone invariant agglomerative hierarchical clustering methods. This family includes the single, minimedian, and complete linkage clustering methods as special cases; its members are used in psychological and ecological applications. Since the literature on clustering space distortion is oriented to quantitative input data, we adapt its basic concepts to input data with only ordinal significance and analyze the space distortion properties of PLL methods. To enable PLL methods to be used when the numbern of objects being clustered is large, we describe an efficient PLL algorithm that operates inO(n ² logn) time andO(n ²) space.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung. 相似文献

10.

A preliminary study of optimal variable weighting in k-means clustering 总被引：2，自引：0，他引：2

Paul E. Green Jonathan Kim Frank J. Carmone 《Journal of Classification》1990,7(2):271-285

Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one of these algorithms cross-validates extremely well.The present study applies a k-means, optimal weighting procedure to two empirical data sets and contrasts its cross-validation performance with that of unit (i.e., equal) weighting of the variables. We find that the optimal weighting procedure cross-validates better in one of the two data sets. In the second data set its comparative performance strongly depends on the approach used to find seed values for the initial k-means partitioning.The authors would like to acknowledge the support of the Citibank Fellowship from the Sol C. Snider Entrepreneurial Center at the Wharton School. The authors would like to express their appreciation to J. Douglas Carroll and Abba M. Kreiger for comments on an earlier version of the paper. 相似文献