首页 | 本学科首页   官方微博 | 高级检索  
 共查询到6条相似文献,搜索用时 0 毫秒
The mixture method of clustering applied to three-way data   总被引:3,自引:3,他引:0  
Clustering or classifying individuals into groups such that there is relative homogeneity within the groups and heterogeneity between the groups is a problem which has been considered for many years. Most available clustering techniques are applicable only to a two-way data set, where one of the modes is to be partitioned into groups on the basis of the other mode. Suppose, however, that the data set is three-way. Then what is needed is a multivariate technique which will cluster one of the modes on the basis of both of the other modes simultaneously. It is shown that by appropriate specification of the underlying model, the mixture maximum likelihood approach to clustering can be applied in the context of a three-way table. It is illustrated using a soybean data set which consists of multiattribute measurements on a number of genotypes each grown in several environments. Although the problem is set in the framework of clustering genotypes, the technique is applicable to other types of three-way data sets.  相似文献   

The more ways there are of understanding a clustering technique, the more effectively the results can be analyzed and used. I will give a general procedure, calledparameter modification, to obtain from a clustering criterion a variety of equivalent forms of the criterion. These alternative forms reveal aspects of the technique that are not necessarily apparent in the original formulation. This procedure is successful in improving the understanding of a significant number of clustering techniques.The insight obtained will be illustrated by applying parameter modification to partitioning, mixture and fuzzy clustering methods, resulting in a unified approach to the study of these methods and a general algorithm for optimizing them.The author wishes to thank Professor Doctor Hans-Hermann Bock for many stimulating discussions.  相似文献   

This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting proportions of viewing time across television networks for an area sample of households.  相似文献   

Parameters are derived of distributions of three coefficients of similarity between pairs (dyads) of operational taxonomic units for multivariate binary data (presence/absence of attributes) under statistical independence. These are applied to test independence for dyadic data. Association among attributes within operational taxonomic units is allowed. It is also permissible for the two units in the dyad to be drawn from different populations having different presence probabilities of attributes. The variance of the distribution of the similarity coefficients under statistical independence is shown to be relatively large in many empirical situations. This result implies that the practical interpretation of these coefficients requires much care. An application using the Jaccard index is given for the assessment of consensus between psychotherapists and their clients.
La distribution des coefficients de similarité pour les données binaires et les attributs associés
Résumé Les paramètres de la distribution de trois coefficients de similarité entre paires d'éléments taxinomiques opérationels de données multivariables binaires (présence/absence) ont été dérivés dans l'hypothèse d'indépendance statistique. Ces paramètres sont utilisés dans un test d'indépendance pour les données dyadiques. L'existence est autorisée, dans la population d'éléments, d'une association entre plusieurs attributs. Il est également permis que les deux éléments de la dyade soient tirés de deux populations différentes, ayant différentes probabilit és quant à la présence des attributs. Dans beaucoup de situations empiriques, la variance des coefficients de similarité peut être relativement élevée dans le cas d'indépendance statistique. Par conséquence, ces coefficients doivent être interprétés avec précaution. Un exemple est donné pour le coefficient de Jaccard, qui a été employé dans une recherche sur la concordance entre des psychothérapeutes et leurs clients.

We examine the problem of aggregating several partitions of a finite set into a single consensus partition We note that the dual concepts of clustering and isolation are especially significant in this connection. The hypothesis that a consensus partition should respect unanimity with respect to either concept leads us to stress a consensus interval rather than a single partition. The extremes of this interval are characterized axiomatically. If a sufficient totality of traits has been measured, and if measurement errors are independent, then a true classifying partition can be expected to lie in the consensus interval. The structure of the partitions in the interval lends itself to partial solutions of the consensus problem Conditional entropy may be used to quantify the uncertainty inherent in the interval as a whole  相似文献   

The standard procedure in numerical classification and identification of micro-organisms based on binary features is given a justification based on the principle of maximum entropy. This principle also strongly supports the assumption that all characteristics upon which the classification is based are equally important and the use of polythetic taxa. The relevance of the principle of maximum entropy in connection with taxonomic structures based on clustering and maximal predictivity is discussed. A result on asymptotic separateness of maximum entropy distributions has implications for minimizing identification errors.The work was partially supported by the Bank of Sweden Tercentenary Foundation, The Swedish Council for Forestry and Agricultural Research, The Carl Trygger Foundation, and the Swedish Cancer Foundation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号