首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In this paper, dissimilarity relations are defined on triples rather than on dyads. We give a definition of a three-way distance analogous to that of the ordinary two-way distance. It is shown, as a straightforward generalization, that it is possible to define three-way ultrametric, three-way star, and three-way Euclidean distances. Special attention is paid to a model called the semi-perimeter model. We construct new methods analogous to the existing ones for ordinary distances, for example: principal coordinates analysis, the generalized Prim (1957) algorithm, hierarchical cluster analysis.  相似文献   

2.
A validation study of a variable weighting algorithm for cluster analysis   总被引:1,自引:0,他引:1  
De Soete (1986, 1988) proposed a variable weighting procedure when Euclidean distance is used as the dissimilarity measure with an ultrametric hierarchical clustering method. The algorithm produces weighted distances which approximate ultrametric distances as closely as possible in a least squares sense. The present simulation study examined the effectiveness of the De Soete procedure for an applications problem for which it was not originally intended. That is, to determine whether or not the algorithm can be used to reduce the influence of variables which are irrelevant to the clustering present in the data. The simulation study examined the ability of the procedure to recover a variety of known underlying cluster structures. The results indicate that the algorithm is effective in identifying extraneous variables which do not contribute information about the true cluster structure. Weights near 0.0 were typically assigned to such extraneous variables. Furthermore, the variable weighting procedure was not adversely effected by the presence of other forms of error in the data. In general, it is recommended that the variable weighting procedure be used for applied analyses when Euclidean distance is employed with ultrametric hierarchical clustering methods.  相似文献   

3.
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, our approach leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots, we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.  相似文献   

4.
The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.  相似文献   

5.
Metric and Euclidean properties of dissimilarity coefficients   总被引:8,自引:8,他引:0  
We assemble here properties of certain dissimilarity coefficients and are specially concerned with their metric and Euclidean status. No attempt is made to be exhaustive as far as coefficients are concerned, but certain mathematical results that we have found useful are presented and should help establish similar properties for other coefficients. The response to different types of data is investigated, leading to guidance on the choice of an appropriate coefficient.The authors wish to thank the referees, one of whom did a magnificent job in painstakingly checking the detailed algebra and detecting several slips.  相似文献   

6.
Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements??compositions??are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature.  相似文献   

7.
Assume that a dissimilarity measure between elements and subsets of the set being clustered is given. We define the transformation of the set of subsets under which each subset is transformed into the set of all elements whose dissimilarity to it is not greater than a given threshold. Then a cluster is defined as a fixed point of this transformation. Three well-known clustering strategies are considered from this point of view: hierarchical clustering, graph-theoretic methods, and conceptual clustering. For hierarchical clustering generalizations are obtained that allow for overlapping clusters and/or clusters not forming a cover. Three properties of dissimilarity are introduced which guarantee the existence of fixed points for each threshold. We develop the relation to the theory of quasi-concave set functions, to help give an additional interpretation of clusters.  相似文献   

8.
One key point in cluster analysis is to determine a similarity or dissimilarity measure between data objects. When working with time series, the concept of similarity can be established in different ways. In this paper, several non-parametric statistics originally designed to test the equality of the log-spectra of two stochastic processes are proposed as dissimilarity measures between time series data. Their behavior in time series clustering is analyzed throughout a simulation study, and compared with the performance of several model-free and model-based dissimilarity measures. Up to three different classification settings were considered: (i) to distinguish between stationary and non-stationary time series, (ii) to classify different ARMA processes and (iii) to classify several non-linear time series models. As it was expected, the performance of a particular dissimilarity metric strongly depended on the type of processes subjected to clustering. Among all the measures studied, the nonparametric distances showed the most robust behavior.  相似文献   

9.
A study of standardization of variables in cluster analysis   总被引:2,自引:2,他引:0  
A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure. Existing results have been mixed with some studies recommending standardization and others suggesting that it may not be desirable. The existence of numerous approaches to standardization complicates the decision process. The present simulation study examined the standardization problem. A variety of data structures were generated which varied the intercluster spacing and the scales for the variables. The data sets were examined in four different types of error environments. These involved error free data, error perturbed distances, inclusion of outliers, and the addition of random noise dimensions. Recovery of true cluster structure as found by four clustering methods was measured at the correct partition level and at reduced levels of coverage. Results for eight standardization strategies are presented. It was found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure. The result held over different error conditions, separation distances, clustering methods, and coverage levels. The traditionalz-score transformation was found to be less effective in several situations.  相似文献   

10.
The nearest neighbor interchange (nni) metric is a distance measure providing a quantitative measure of dissimilarity between two unrooted binary trees with labeled leaves. The metric has a transparent definition in terms of a simple transformation of binary trees, but its use in nontrivial problems is usually prevented by the absence of a computationally efficient algorithm. Since recent attempts to discover such an algorithm continue to be unsuccessful, we address the complementary problem of designing an approximation to the nni metric. Such an approximation should be well-defined, efficient to compute, comprehensible to users, relevant to applications, and a close fit to the nni metric; the challenge, of course, is to compromise these objectives in such a way that the final design is acceptable to users with practical and theoretical orientations. We describe an approximation algorithm that appears to satisfy adequately these objectives. The algorithm requires O(n) space to compute dissimilarity between binary trees withn labeled leaves; it requires O(n logn) time for rooted trees and O(n 2 logn) time for unrooted trees. To help the user interpret the dissimilarity measures based on this algorithm, we describe empirical distributions of dissimilarities between pairs of randomly selected trees for both rooted and unrooted cases.The Natural Sciences and Engineering Research Council of Canada partially supported this work with Grant A-4142.  相似文献   

11.
A procedure is presented which permits the analysis of factor analytic problems in which several groups exist. The analysis incorporates a hierarchical scheme of searching for factorial invariance and is an extension of Meredith's (1964) Method One procedure. By overlaying a contextual frame of reference on a traditional factor analysis solution, it is possible to use this technique to examine structural similarity and dissimilarity between groups. The procedure is exhibited in an example and in addition a comparison is made to discriminant analysis.  相似文献   

12.
A classification of presence/absence based dissimilarity coefficients   总被引:1,自引:1,他引:0  
Several desirable order properties for dissimilarity coefficients based on presence/absence of attributes are given and several popular dissimilarity coefficients are examined with respect to these properties. A characterization for rational functions with linear numerator and linear denominator satisfying all of the desirable properties is given.  相似文献   

13.
Two algorithms for pyramidal classification — a generalization of hierarchical classification — are presented that can work with incomplete dissimilarity data. These approaches — a modification of the pyramidal ascending classification algorithm and a least squares based penalty method — are described and compared using two different types of complete dissimilarity data in which randomly chosen dissimilarities are assumed missing and the non-missing ones are subjected to random error. We also consider relationships between hierarchical classification and pyramidal classification solutions when both are based on incomplete dissimilarity data.  相似文献   

14.
The paper addresses the problem of specifying differential weights for variables in the construction of a measure of dissimilarity. An assessor is required to provide subjective judgments of the pairwise dissimilarities within a training set of objects, and these dissimilarities are then modeled as a function of the recorded differences between the objects on each of the variables. The aim is to make explicit the relative importance that assessors attach to each of the variables, and thus obtain guidance on how these variables should be combined into a relevant dissimilarity matrix. The methodology is illustrated by application to some archaeological data.  相似文献   

15.
The paper presents a methodology for classifying three-way dissimilarity data, which are reconstructed by a small number of consensus classifications of the objects each defined by a sum of two order constrained distance matrices, so as to identify both a partition and an indexed hierarchy. Specifically, the dissimilarity matrices are partitioned in homogeneous classes and, within each class, a partition and an indexed hierarchy are simultaneously fitted. The model proposed is mathematically formalized as a constrained mixed-integer quadratic problem to be fitted in the least-squares sense and an alternating least-squares algorithm is proposed which is computationally efficient. Two applications of the methodology are also described together with an extensive simulation to investigate the performance of the algorithm.  相似文献   

16.
Comparing resemblance measures   总被引:4,自引:0,他引:4  
In the paper some types of equivalences over resemblance measures and some basic results about them are given. Based on induced partial orderings on the set of unordered pairs of units a dissimilarity between two resemblance measures over finite sets of units can be defined. As an example, using this dissimilarity standard association coefficients between binary vectors are compared both theoretically and computationally.Extended version of the paper presented at DISTANCIA'92, June 22–26, 1992, Rennes, France. This work was supported in part by the Ministry for Science and Technology of Slovenia, J2-6191-101-94. We would like to thank the editor and two anonymous referees for numerous remarks and suggestions that significantly improved the presentation of the material.  相似文献   

17.
NP-hard Approximation Problems in Overlapping Clustering   总被引:1,自引:1,他引:0  
Lp -norm (p < ∞). These problems also correspond to the approximation by a strongly Robinson dissimilarity or by a dissimilarity fulfilling the four-point inequality (Bandelt 1992; Diatta and Fichet 1994). The results are extended to circular strongly Robinson dissimilarities, indexed k-hierarchies (Jardine and Sibson 1971, pp. 65-71), and to proper dissimilarities satisfying the Bertrand and Janowitz (k + 2)-point inequality (Bertrand and Janowitz 1999). Unidimensional scaling (linear or circular) is reinterpreted as a clustering problem and its hardness is established, but only for the L 1 norm.  相似文献   

18.
Percept variance is shown to change the additive property of city-block distances and make city-block distances more subadditive than Euclidean distances. Failure to account for percept variance will result in the misclassification of city-block data as Euclidean. A maximum likelihood estimation procedure is proposed for the multidimensional scaling of similarity data characterized by percept variance. Monte Carlo and empirical experiments are used to evaluate the proposed approach.  相似文献   

19.
We extend previous results of Kroonenberg and de Leeuw (1980) and Kroonenberg (1983, Ch. 5) on transformations of the extended core matrix of the Tucker2 model (Kroonenberg and de Leeuw 1980). In particular, it is shown that non-singular transformations to diagonalize the core matrix will lead to PARAFAC solutions (Harshman 1970; Harshman and Lundy 1984), if such solutions exist. This note is a revised version of a paper presented at the 4th European Meeting of the Psychometric Society, Cambridge, U.K., 2–5 July 1985, and is based on the first author's master thesis.  相似文献   

20.
Efficient algorithms for agglomerative hierarchical clustering methods   总被引:11,自引:4,他引:7  
Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n 3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n 3) worst-case time, can reasonably be expected to exhibit 0(n 2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n 2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n 2) time and space.Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n 2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号