首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 500 毫秒
It is shown that replacement of the zero diagonal elements of the symmetric data matrix of approximate squared distances by certain other quantities in the Young-Householder algorithm will yield a least squares fit to squared distances instead of to scalar products. Iterative algorithms for obtaining these replacement diagonal elements are described and relationships with the ELEGANT algorithm (de Leeuw 1975; Takane 1977) are discussed. In large residual situations a penalty function approach, motivated by the ELEGANT algorithm, is adopted. Empirical comparisons of the algorithms are given.An early version of this paper was presented at the Multidimensional Data Analysis Workshop, Pembroke College, Cambridge, July 1985. I want to thank Jan de Leeuw and Yoshio Takane for bringing the ELEGANT algorithm to my attention and for clarifying its rationale and notation. My thanks go also to Stephen du Toit for help with the ALSCAL computations reported in Section 7.  相似文献   

The majorization method for multidimensional scaling with Kruskal's STRESS has been limited to Euclidean distances only. Here we extend the majorization algorithm to deal with Minkowski distances with 1≤p≤2 and suggest an algorithm that is partially based on majorization forp outside this range. We give some convergence proofs and extend the zero distance theorem of De Leeuw (1984) to Minkowski distances withp>1.  相似文献   

Five different methods for obtaining a rational initial estimate of the stimulus space in the INDSCAL model were compared using the SINDSCAL program for fitting INDSCAL. The effect of the number of stimuli, the number of subjects, the dimensionality, and the amount of error on the quality and efficiency of the final SINDSCAL solution were investigated in a Monte Carlo study. We found that the quality of the final solution was not affected by the choice of the initialization method, suggesting that SINDSCAL finds a global optimum regardless of the initialization method used. The most efficient procedures were the methods proposed by by de Leeuw and Pruzansky (1978) and by Flury and Gautschi (1986) for the simultaneous diagonalization of several positive definite symmetric matrices, and a method based on linearly constraining the stimulus space using the CANDELINC approach developed by Carroll, Pruzansky, and Kruskal (1980).Geert De Soete is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek. The authors gratefully acknowledge the helpful comments and suggestions of the reviewers.  相似文献   

The analysis of a three-way data set using three-mode principal components analysis yields component matrices for all three modes of the data, and a three-way array called the core, which relates the components for the different modes to each other. To exploit rotational freedom in the model, one may rotate the core array (over all three modes) to an optimally simple form, for instance by three-mode orthomax rotation. However, such a rotation of the core may inadvertently detract from the simplicity of the component matrices. One remedy is to rotate the core only over those modes in which no simple solution for the component matrices is desired or available, but this approach may in turn reduce the simplicity of the core to an unacceptable extent. In the present paper, a general approach is developed, in which a criterion is optimized that not only takes into account the simplicity of the core, but also, to any desired degree, the simplicity of the component matrices. This method (in contrast to methods for either core or component matrix rotation) can be used to find solutions in which the core and the component matrices are all reasonably simple.  相似文献   

A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Caroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices.  相似文献   

It is common practice to perform a principal component analysis (PCA) on a correlation matrix to represent graphically the relations among numerous variables. In such a situation, the variables may be considered as points on the unit hypersphere of an Euclidean space, and PCA provides a sort of best fit of these points within a subspace. Taking into account their particular position, this paper suggests to represent the variables on an optimal three-dimensional unit sphere.
Résumé Il est classique d'utiliser une analyse en composantes principales pour représenter graphiquement une matrice de corrélation. Dans une telle situation, les variables peuvent être considérées comme des points sur l'hypersphère unité d'un espace Euclidien, et l'analyse en composantes principales permet d'obtenir une bonne approximation de ces points à l'aide d'un sous-espace Euclidien. Prenant en compte une telle situation géométrique, le présent article suggère de représenter les variables sur une sphère tri-dimensionelle optimale.

Two classes of element-wise transformations are proved to preserve the positive semi-definite nature of coefficient matrices. The correctness of a conjecture by Gower and Legendre on the positive semidefinite nature of a certain coefficient matrix is proved. It is shown that the matrix of monotonicity coefficients proposed by Bentler is positive semidefinite for data without ties.  相似文献   

When a dissimilarity matrix cannot be represented in a Euclidean space, it is possible to make it Euclidean by means of suitable transformations of the original dissimilarity values. In this paper we discuss some interesting properties of a class of transformations based on adding a specific squared Euclidean distance to the initial dissimilarity. An erratum to this article is available at .  相似文献   

The dependence on history of both present and future dynamics of life is a common intuition in biology and in humanities. Historicity will be understood in terms of changes of the space of possibilities (or of “phase space”) as well as by the role of diversity in life’s structural stability and of rare events in history formation. We hint to a rigorous analysis of “path dependence” in terms of invariants and invariance preserving transformations, as it may be found also in physics, while departing from the physico-mathematical analyses. The idea is that the (relative or historicized) invariant traces of the past under organismal or ecosystemic transformations contribute to the understanding (or the “theoretical determination”) of present and future states of affairs. This yields a peculiar form of unpredictability (or randomness) in biology, at the core of novelty formation: the changes of observables and pertinent parameters may depend also on past events. In particular, in relation to the properties of synchronic measurement in physics, the relevance of diachronic measurement in biology is highlighted. This analysis may a fortiori apply to cognitive and historical human dynamics, while allowing to investigate some general properties of historicity in biology.  相似文献   

In this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performance is investigated through Monte Carlo experiments, and it shows favorable results compared to other classical criteria.
Résumé Nous proposons un critère d'entropie pour évaluer le nombre de classes d'une partition en nous fondant sur un modèle de mélange de lois de probabilité. Ce critère se déduit d'une relation liant la vraisemblance et la vraisemblance classifiante d'un mélange. Des simulations de Monte Carlo illustrent ses qualités par rapport à des critères plus classiques.

Maximum sum-of-splits clustering   总被引:1,自引:1,他引:0  
ConsiderN entities to be classified, and a matrix of dissimilarities between pairs of them. The split of a cluster is the smallest dissimilarity between an entity of this cluster and an entity outside it. The single-linkage algorithm provides partitions intoM clusters for which the smallest split is maximum. We study here the average split of the clusters or, equivalently, the sum of splits. A (N 2) algorithm is provided to determine maximum sum-of-splits partitions intoM clusters for allM betweenN – 1 and 2, using the dual graph of the single-linkage dendrogram.
Résumé SoientN objets à classifier et une matrice de dissimilarit és entre paires de ces objets. L'écart d'une classe est la plus petite dissimilarité entre un objet de cette classe et un objet en dehors d'elle. L'algorithme du lien simple fournit des partitions enM classes dont le plus petit écart est maximum. On étudie l'écart moyen des classes, ou, ce qui est équivalent, la somme des écarts. On propose un algorithme en (N 2) pour déterminer des partitions enM classes dont la somme des écarts est maximum pourM allant deN – 1 à 2, basé sur le graphe dual du dendrogramme de la méthode du lien simple.

This paper presents a general approach for fitting the ADCLUS (Shepard and Arabie 1979; Arabie, Carroll, DeSarbo, and Wind 1981), INDCLUS (Carroll and Arabie 1983), and potentially a special case of the GENNCLUS (DeSarbo 1982) models. The proposed approach, based largely on a separability property observed for the least squares loss function being optimized, offers increased efficiency and other advantages over existing approaches like MAPCLUS (Arabie and Carroll 1980) for fitting the ADCLUS model, and the INDCLUS method for fitting the INDCLUS model. The new procedure (called SINDCLUS) is applied to three sets of empirical data to demonstrate the effectiveness of the SINDCLUS methodology. Finally, some potentially useful extensions are discussed.  相似文献   

科学技术自主创新与我国企业核心竞争力的提升   总被引:4,自引:0,他引:4  
在当代,科学自主创新是技术自主创新的基础和先导;技术自主创新为科学自主创新提供先进的物质技术手段和强大的需求动力。同时,科学技术自主创新是我国企业提升和保持核心竞争力的关键。因此,把握科学技术自主创新概念的现代含义,深入分析企业核心竞争力的结构模式,揭示科学技术自主创新对企业核心竞争力的作用机制,探讨我国提高并保持企业核心竞争力的对策,就具有突出的理论意义与实际价值。  相似文献   

Divisive hierarchical clustering algorithms with the diameter criterion proceed by recursively selecting the cluster with largest diameter and partitioning it into two clusters whose largest diameter is smallest possible. We provide two such algorithms with complexitiesO( N 2) andO(N 2logN) respectively, where denotes the maximum number of clusters in a partition andN the number of entities to be clustered. The former algorithm, an efficient implementation of an algorithm of Hubert, allows to find all partitions into at most clusters and is inO(N 2) for fixed . Moreover, if in each partitioning the size of the largest cluster is bounded byp times the number of entities in the set to be partitioned, with 1/2<=p<1, it provides a complete hierarchy of partitionsO(N 2 logN) time. The latter algorithm, a refinement of an algorithm of Rao allows to build a complete hierarchy of partitions inO(N 2 logN) time without any restriction. Comparative computational experiments with both algorithms and with an agglomerative hierarchical algorithm of Benzécri are reported.
Résumé Les algorithmes de classification hiérarchique descendante utilisant le critère du diamètre, sélectionnent récursivement la classe de plus grand diamètre et la partitionnent en deux classes, dont le plus grand diamètre est le plus, petit possible. Nous proposons deux tels algorithmes, avec des complexités enO ( N2) etO(N 2 logN) respectivement, où désigne le nombre maximum de classes d'une partition etN le nombre d'objets à classifier. Le premier algorithme, une implantation d'un algorithme de Hubert, permet de construire des partitions avec au plus classes et est enO(N 2) pour fixé. De plus, si dans chaque bipartition le nombre d'objets de la plus grande classe, est borné parp fois le nombre d'objets de l'ensemble à partitionner, où 1/2≤p<1, cet algorithme permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN). Le second algorithme, un raffinement d'un algorithme de Rao, permet de construire une hiérarchie complète de partitions en tempsO(N 2 logN) sans aucune restriction On présente également des résultats de calcul comparatifs pour les deux algorithmes et pour l'algorithme de classification hiérarchique ascendante de Benzécri.

We study the application of simulated annealing and tabu search to the solution of the clique partitioning problem. We illustrate the effecveness of these techniques by computational results associated not only with randomly generated problems, but also with real-life problems arising from applications concerning the optimal aggregation of binary relations into an equivalence relation. The need for these approaches is emphasized by the example of a special class of instances of the clique partitioning problem for which the most commonly used heuristics perform arbitrarily badly, while tabu search systematically obtains the optimal solution.
Résumé Nous étudions dans cet article l'application du recuit simulé et de la méthode de recherche tabou dans la résolution du problème de partitionnement de graphes en cliques. Nous illustrons l'efficacité de ces techniques par des résultats numériques associés soit à des problèmes génerés au hasard, soit à des problèmes réels concernant l'agrégation de relations binaires dans une relation d'équivalence. L'intérêt de ces approches est mis en évidence à travers une classe de problèmes pour lesquels les heuristiques les plus connues ont une performance arbitrairement mauvaise, tandis que la méthode de recherche tabou obtient systématiquement des solutions optimales.

A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

万丹 《自然辩证法通讯》2012,(3):118-124,128
20世纪80年代库恩哲学中原本的核心概念“范式”消失了。这似乎成为库恩哲学转向的标志,甚至成为历史主义科学哲学转向的标志。事实上“范式”观念以“类词”的名称依然存在于库恩哲学中,我们完全可以追溯这一演变的历程,从而揭示演变发生的原因。  相似文献   

Classifications are generally pictured in the form of hierarchical trees, also called dendrograms. A dendrogram is the graphical representation of an ultrametric (=cophenetic) matrix; so dendrograms can be compared to one another by comparing their cophenetic matrices. Three methods used in testing the correlation between matrices corresponding to dendrograms are evaluated. The three permutational procedures make use of different aspects of the information to compare dendrograms: the Mantel procedure permutes label positions only; the binary tree methods randomize the topology as well; the double-permutation procedure is based on all the information included in a dendrogram, that is: topology, label positions, and cluster heights. Theoretical and empirical investigations of these methods are carried out to evaluate their relative performance. Simulations show that the Mantel test is too conservative when applied to the comparison of dendrograms; the methods of binary tree comparisons do slightly better; only the doublepermutation test provides unbiased type I error. Les arbres utilisés pour illustrés les groupements sont généralement représentés sous la forme de classifications hiérarchiques ou dendrogrammes. Un dendrogramme représente graphiquement l’information contenue dans la matrice ultramétrique (=cophénétique) correspondant à la classification. Dès ultramétriques correspondantes. Nous comparons trois méthodes permettant d’évaluer la signification statistique du coefficient de correlation mesuré entre deux matrices ultramétriques. Ces trois tests par permutations tiennent compte d’aspects différents pour comparer des dendrogrammes: le test de Mantel permute les feuilles de l’arbre, les méthodes pour arbres binaires permutent les feuilles et la topologie, alors que la procédure à double permutation permute les feuilles, la topologie et les niveaux de fusion des dendrogrammes comparés. L’efficacité relative des trois méthodes est évaluée empiriquement et théoriquement. Nos résultats suggèrent l’utilisation préférentielle du test à double permutation pour la comparaison de dendrogrammes: le test de Mantel s’avère trop conservateur, tandis que les méthodes pour arbres binaires ne sont pas toujours adéquates.
This work was supported by NSERC grant no. A7738 to Pierre Legendre and by a NSERC scholarship to F.-J. Lapointe.  相似文献   

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.  相似文献   

Parameters are derived of distributions of three coefficients of similarity between pairs (dyads) of operational taxonomic units for multivariate binary data (presence/absence of attributes) under statistical independence. These are applied to test independence for dyadic data. Association among attributes within operational taxonomic units is allowed. It is also permissible for the two units in the dyad to be drawn from different populations having different presence probabilities of attributes. The variance of the distribution of the similarity coefficients under statistical independence is shown to be relatively large in many empirical situations. This result implies that the practical interpretation of these coefficients requires much care. An application using the Jaccard index is given for the assessment of consensus between psychotherapists and their clients.
La distribution des coefficients de similarité pour les données binaires et les attributs associés
Résumé Les paramètres de la distribution de trois coefficients de similarité entre paires d'éléments taxinomiques opérationels de données multivariables binaires (présence/absence) ont été dérivés dans l'hypothèse d'indépendance statistique. Ces paramètres sont utilisés dans un test d'indépendance pour les données dyadiques. L'existence est autorisée, dans la population d'éléments, d'une association entre plusieurs attributs. Il est également permis que les deux éléments de la dyade soient tirés de deux populations différentes, ayant différentes probabilit és quant à la présence des attributs. Dans beaucoup de situations empiriques, la variance des coefficients de similarité peut être relativement élevée dans le cas d'indépendance statistique. Par conséquence, ces coefficients doivent être interprétés avec précaution. Un exemple est donné pour le coefficient de Jaccard, qui a été employé dans une recherche sur la concordance entre des psychothérapeutes et leurs clients.

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号