首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到14条相似文献,搜索用时 15 毫秒
1.
Large-sample results for optimization-based clustering methods   总被引:1,自引:0,他引:1  
Many common (nonhierarchical) clustering and classification methods are optimization-based methods, in the sense described by Windham (1987) in this Journal. This paper gives some large sample properties for estimates derived by such methods. Under appropriate conditions, such estimates converge with probability one to a limit, and are asymptotically normally distributed around that limiting value. The conditions are satisfied by most of the common examples of optimization-based methods. Prepared for the 2nd International Conference, International Federation of Classification Societies, Charlottesville, VA, 1989. Supported in part by summer research funds, Graduate School of Business Administration, University of Colorado at Denver.  相似文献   

2.
The more ways there are of understanding a clustering technique, the more effectively the results can be analyzed and used. I will give a general procedure, calledparameter modification, to obtain from a clustering criterion a variety of equivalent forms of the criterion. These alternative forms reveal aspects of the technique that are not necessarily apparent in the original formulation. This procedure is successful in improving the understanding of a significant number of clustering techniques.The insight obtained will be illustrated by applying parameter modification to partitioning, mixture and fuzzy clustering methods, resulting in a unified approach to the study of these methods and a general algorithm for optimizing them.The author wishes to thank Professor Doctor Hans-Hermann Bock for many stimulating discussions.  相似文献   

3.
Piecewise hierarchical clustering   总被引:1,自引:0,他引:1  
We consider two or more ultrametric distance matrices defined over different, possibly overlapping, subsets. These matrices are merged into one ultrametric matrix defined over the whole set. Necessary and sufficient conditions for uniqueness of the merging are established. when these conditions are not satisfied, consistent algorithms are given.The author thanks B. Monjardet and two anonymous referees for their helpful comments.  相似文献   

4.
Proportional link linkage (PLL) clustering methods are a parametric family of monotone invariant agglomerative hierarchical clustering methods. This family includes the single, minimedian, and complete linkage clustering methods as special cases; its members are used in psychological and ecological applications. Since the literature on clustering space distortion is oriented to quantitative input data, we adapt its basic concepts to input data with only ordinal significance and analyze the space distortion properties of PLL methods. To enable PLL methods to be used when the numbern of objects being clustered is large, we describe an efficient PLL algorithm that operates inO(n 2 logn) time andO(n 2) space.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

5.
Efficient algorithms for agglomerative hierarchical clustering methods   总被引:7,自引:4,他引:7  
Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n 3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n 3) worst-case time, can reasonably be expected to exhibit 0(n 2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n 2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n 2) time and space.Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n 2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

6.
We review methods of qualitative factor analysis (QFA) developed by the author and his collaborators over the last decade and discuss the use of QFA methods for the additive clustering problem. The QFA method includes, first, finding a square Boolean matrix in a fixed set of Boolean matrices with simple structures to approximate a given similarity matrix, and, second, repeating this process again and again using residual similarity matrices. We present convergence properties for three versions of the method, provide cluster interpretations for results obtained from the algorithms, and give formulas for the evaluation of factor shares of the initial similarities variance.I am indebted to Professor P. Arabie and the referees for valuable comments and editing of the text.  相似文献   

7.
Ordered set theory provides efficient tools for the problems of comparison and consensus of classifications Here, an overview of results obtained by the ordinal approach is presented Latticial or semilatticial structures of the main sets of classification models are described Many results on partitions are adaptable to dendrograms; many results on n-trees hold in any median semilattice and thus have counterparts on ordered trees and Buneman (phylogenetic) trees For the comparison of classifications, the semimodularity of the ordinal structures involved yields computable least-move metrics based on weighted or unweighted elementary transformations In the unweighted case, these metrics have simple characteristic properties For the consensus of classifications, the constructive, axiomatic, and optimization approaches are considered Natural consensus rules (majoritary, oligarchic, ) have adequate ordinal formalizations A unified presentation of Arrow-like characterization results is given In the cases of n-trees, ordered trees and Buneman trees, the majority rule is a significant example where the three approaches convergeThe authors would like to thank the anonymous referees for helpful suggestions on the first draft of this paper, and W H E Day for his comments and his significant improvements of style  相似文献   

8.
9.
We investigate the consensus problem for classifications of three types: partitions, dendrograms, and n-trees For partitions or dendrograms, lattice polynomials define natural consensus functions We extend these lattice methods to n-trees, introducing a general class of consensus functions that includes the intersection consensus functions in current use These lattice consensus methods have a number of desirable mathematical properties We prove that they all satisfy the Pareto Axiom For each of the three classification types, we determine which lattice consensus functions satisfy the Betweenness AxiomAuthor partially supported by a research grant from the Faculty Research Committee, Bowling State University  相似文献   

10.
An approach to numerical classification is described, which treats the assignment of objects to types as a continuous variable, called an assignment measure. Describing a classification by an assignment measure allows one not only to determine the types of objects, but also to see relationships among the objects of the same type and among the types themselves.A classification procedure, the Assignment-Prototype algorithm, is described and evaluated. It is a numerical technique for obtaining assignment measures directly from one-mode, two-way proximity matrices.  相似文献   

11.
Two fundamental approaches to the comparison of classifications (e g, partitions on the same finite set of objects) can be distinguished One approach is based upon measures of metric dissimilarity while the other is based upon measures of similarity, or consensus These approaches are not necessarily simple complements of each other Instead, each captures different, limited views of comparison of two classifications The properties of these measures are clarified by their relationships to Day's complexity models and to association measures of numerical taxonomy The two approaches to comparison are equated with the use of separation and minimum value sensitive measures, suggesting the potential application of an intermediate sensitive measure to the problem of comparison of classifications Such a measure is a linear combination of separation sensitive and minimum value sensitive components The application of these intermediate measures is contrasted with the two extremes The intermediate measure for the comparison of classifications is applied to a problem of character weighting arising in the analysis of Australian stream basinsWe thank Bill Day, Mike Austin, Peter Minchin and two anonymous referees for many helpful comments We also thank P Arabie for useful discussion of consensus methods and character weighting  相似文献   

12.
We investigate the properties of several significance tests for distinguishing between the hypothesisH of a homogeneous population and an alternativeA involving clustering or heterogeneity, with emphasis on the case of multidimensional observationsx 1, ...,x n p . Four types of test statistics are considered: the (s-th) largest gap between observations, their mean distance (or similarity), the minimum within-cluster sum of squares resulting from a k-means algorithm, and the resulting maximum F statistic. The asymptotic distributions underH are given forn and the asymptotic power of the tests is derived for neighboring alternatives.  相似文献   

13.
There have been many comparative studies of classification methods in which real datasets are used as a gauge to assess the relative performance of the methods. Since these comparisons often yield inconclusive or limited results on how methods perform, it is often believed that a broader approach combining these studies would shed some light on this difficult question. This paper describes such an attempt: we have sampled the available literature and created a dataset of 5807 classification results. We show that one of the possible ways to analyze the resulting data is an overall assessment of the classification methods, and we present methods for that particular aim. The merits and demerits of such an approach are discussed, and conclusions are drawn which may assist future research: we argue that the current state of the literature hardly allows large-scale investigations. This work was sponsored by the MOD Corporate Research Programme, CISP, as part of a larger project on technology assessment. We would like to express our appreciation to Andrew Webb for his support throughout the entire project, and to Wojtek Krzanowski for valuable comments on a draft of this paper. We would also like to thank the anonymous referees for some very interesting comments, some of which we hope to pursue in future work.  相似文献   

14.
本文介绍了李时珍、林奈各自分类学传统及其突破,重点比较两者分类体系的异同,从中浅析异同的原因及其对后世的影响,并以此作为对李约瑟难题的一个小小的回应。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号