首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a set of objects and a symmetric matrix of dissimilarities between them, Unidimensional Scaling is the problem of finding a representation by locating points on a continuum. Approximating dissimilarities by the absolute value of the difference between coordinates on a line constitutes a serious computational problem. This paper presents an algorithm that implements Simulated Annealing in a new way, via a strategy based on a weighted alternating process that uses permutations and point-wise translations to locate the optimal configuration. Explicit implementation details are given for least squares loss functions and for least absolute deviations. The weighted, alternating process is shown to outperform earlier implementations of Simulated Annealing and other optimization strategies for Unidimensional Scaling in run time efficiency, in solution quality, or in both.  相似文献   

2.
Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model.  相似文献   

3.
This paper considers the use of radial basis functions for exploratory data analysis. These are used to model a transformation from a high-dimensional observation space to a low-dimensional one. The parameters of the model are determined by optimising a loss function defined to be the stress function in multidimensional scaling. The metric for the low-dimensional space is taken to be the Minkowski metric with order parameter 1<-p<-2. A scheme based on iterative majorisation is proposed.  相似文献   

4.
The majorization method for multidimensional scaling with Kruskal's STRESS has been limited to Euclidean distances only. Here we extend the majorization algorithm to deal with Minkowski distances with 1≤p≤2 and suggest an algorithm that is partially based on majorization forp outside this range. We give some convergence proofs and extend the zero distance theorem of De Leeuw (1984) to Minkowski distances withp>1.  相似文献   

5.
In this paper we develop a version of the Jackknife which seems especially suited for Multidimensional Scaling. It deletes one stimulus at a time, and combines the resulting solutions by a least squares matching method. The results can be used for stability analysis, and for purposes of cross validation.  相似文献   

6.
The nearest neighbor interchange (nni) metric is a distance measure providing a quantitative measure of dissimilarity between two unrooted binary trees with labeled leaves. The metric has a transparent definition in terms of a simple transformation of binary trees, but its use in nontrivial problems is usually prevented by the absence of a computationally efficient algorithm. Since recent attempts to discover such an algorithm continue to be unsuccessful, we address the complementary problem of designing an approximation to the nni metric. Such an approximation should be well-defined, efficient to compute, comprehensible to users, relevant to applications, and a close fit to the nni metric; the challenge, of course, is to compromise these objectives in such a way that the final design is acceptable to users with practical and theoretical orientations. We describe an approximation algorithm that appears to satisfy adequately these objectives. The algorithm requires O(n) space to compute dissimilarity between binary trees withn labeled leaves; it requires O(n logn) time for rooted trees and O(n 2 logn) time for unrooted trees. To help the user interpret the dissimilarity measures based on this algorithm, we describe empirical distributions of dissimilarities between pairs of randomly selected trees for both rooted and unrooted cases.The Natural Sciences and Engineering Research Council of Canada partially supported this work with Grant A-4142.  相似文献   

7.
Block-Relaxation Approaches for Fitting the INDCLUS Model   总被引:1,自引:1,他引:0  
A well-known clustering model to represent I?×?I?×?J data blocks, the J frontal slices of which consist of I?×?I object by object similarity matrices, is the INDCLUS model. This model implies a grouping of the I objects into a prespecified number of overlapping clusters, with each cluster having a slice-specific positive weight. An INDCLUS model is fitted to a given data set by means of minimizing a least squares loss function. The minimization of this loss function has appeared to be a difficult problem for which several algorithmic strategies have been proposed. At present, the best available option seems to be the SYMPRES algorithm, which minimizes the loss function by means of a block-relaxation algorithm. Yet, SYMPRES is conjectured to suffer from a severe local optima problem. As a way out, based on theoretical results with respect to optimally designing block-relaxation algorithms, five alternative block-relaxation algorithms are proposed. In a simulation study it appears that the alternative algorithms with overlapping parameter subsets perform best and clearly outperform SYMPRES in terms of optimization performance and cluster recovery.  相似文献   

8.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

9.
It is shown that replacement of the zero diagonal elements of the symmetric data matrix of approximate squared distances by certain other quantities in the Young-Householder algorithm will yield a least squares fit to squared distances instead of to scalar products. Iterative algorithms for obtaining these replacement diagonal elements are described and relationships with the ELEGANT algorithm (de Leeuw 1975; Takane 1977) are discussed. In large residual situations a penalty function approach, motivated by the ELEGANT algorithm, is adopted. Empirical comparisons of the algorithms are given.An early version of this paper was presented at the Multidimensional Data Analysis Workshop, Pembroke College, Cambridge, July 1985. I want to thank Jan de Leeuw and Yoshio Takane for bringing the ELEGANT algorithm to my attention and for clarifying its rationale and notation. My thanks go also to Stephen du Toit for help with the ALSCAL computations reported in Section 7.  相似文献   

10.
Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set.  相似文献   

11.
k  . In this procedure, a least-squares loss function in terms of discrepancies between D and M is minimized. The present paper describes the original hierarchical classes algorithm proposed by De Boeck and Rosenberg (1988), which is based on an alternating greedy heuristic, and proposes a new algorithm, based on an alternating branch-and-bound procedure. An extensive simulation study is reported in which both algorithms are evaluated and compared according to goodness-of-fit to the data and goodness-of-recovery of the underlying true structure. Furthermore, three heuristics for selecting models of different ranks for a given D are presented and compared. The simulation results show that the new algorithm yields models with slightly higher goodness-of-fit and goodness-of-recovery values.  相似文献   

12.
The theory of the tight span, a cell complex that can be associated to every metric D, offers a unifying view on existing approaches for analyzing distance data, in particular for decomposing a metric D into a sum of simpler metrics as well as for representing it by certain specific edge-weighted graphs, often referred to as realizations of D. Many of these approaches involve the explicit or implicit computation of the so-called cutpoints of (the tight span of) D, such as the algorithm for computing the “building blocks” of optimal realizations of D recently presented by A. Hertz and S. Varone. The main result of this paper is an algorithm for computing the set of these cutpoints for a metric D on a finite set with n elements in O(n3) time. As a direct consequence, this improves the run time of the aforementioned O(n6)-algorithm by Hertz and Varone by “three orders of magnitude”.  相似文献   

13.
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.  相似文献   

14.
In this paper two alternative loss criteria for the least squares Procrustes problem are studied. These alternative criteria are based on the Huber function and on the more radical biweight function, which are designed to be resistant to outliers. Using iterative majorization it is shown how a convergent reweighted least squares algorithm can be developed. In asimulation study it turns out that the proposed methods perform well over a specific range of contamination. When a uniform dilation factor is included, mixed results are obtained. The methods also yield a set of weights that can be used for diagnostic purposes.  相似文献   

15.
Let G = (V,E,w) be a graph with vertex and edge sets V and E, respectively, and w:E → R + a function which assigns a positive weight or length to each edge of G. G is called a realization of a finite metric space (M,d), with M = { 1,...,n} if and only if { 1,...,n} ⫅ V and d(i,j) is equal to the length of the shortest chain linking i and j in G ∀ i,j = 1,...,n. A realization G of (M,d), is said optimal if the sum of its weights is minimal among all the realizations of (M,d). Consider a partition of M into two nonempty subsets K and L, and let e be an edge in a realization G of (M,d); we say that e is a bridge linking K with L if e belongs to all chains in G linking a vertex of K with a vertex of L. The Metric Bridge Partition Problem is to determine if the elements of a finite metric space (M,d) can be partitioned into two nonempty subsets K and L such that all optimal realizations of (M,d) contain a bridge linking K with L. We prove in this paper that this problem is polynomially solvable. We also describe an algorithm that constructs an optimal realization of (M,d) from optimal realizations of (K,d|K) and (L,d|L).  相似文献   

16.
Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications.  相似文献   

17.
In this paper, we consider several generalizations of the popular Ward’s method for agglomerative hierarchical clustering. Our work was motivated by clustering software, such as the R function hclust, which accepts a distance matrix as input and applies Ward’s definition of inter-cluster distance to produce a clustering. The standard version of Ward’s method uses squared Euclidean distance to form the distance matrix. We explore the effect on the clustering of using other definitions of distance, such as the Minkowski distance.  相似文献   

18.
A modified CANDECOMP algorithm is presented for fitting the metric version of the Extended INDSCAL model to three-way proximity data. The Extended INDSCAL model assumes, in addition to the common dimensions, a unique dimension for each object. The modified CANDECOMP algorithm fits the Extended INDSCAL model in a dimension-wise fashion and ensures that the subject weights for the common and the unique dimensions are nonnegative. A Monte Carlo study is reported to illustrate that the method is fairly insensitive to the choice of the initial parameter estimates. A second Monte Carlo study shows that the method is able to recover an underlying Extended INDSCAL structure if present in the data. Finally, the method is applied for illustrative purposes to some empirical data on pain relievers. In the final section, some other possible uses of the new method are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijik Onderzoek”.  相似文献   

19.
中国古代历法中的三次内插法   总被引:2,自引:1,他引:2  
该文根据《天文大成管窥辑要》中的史料,发现边冈在其《崇玄历》(892年)中创立的晷影公式-中国历法史上第一例三次函数,是通过令影差变化与自变量平方的比值为某个等差数列而构造出来的,与过去认为的三次内插法无关;王恂、郭守敬在《授时历》(1280年)创立的平立定三差算法,则是通过对插值函数的降阶,将问题转化为一般的二次内插公式的构造,前者可能受到了边冈立方相减相乘算法的启发,后者则与刘焯的二次插值算法  相似文献   

20.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号