共查询到20条相似文献,搜索用时 0 毫秒
1.
Carroll and Chang have derived the symmetric CANDECOMP model from the INDSCAL model, to fit symmetric matrices of approximate
scalar products in the least squares sense. Typically, the CANDECOMP algorithm is used to estimate the parameters. In the
present paper it is shown that negative weights may occur with CANDECOMP. This phenomenon can be suppressed by updating the
weights by the Nonnegative Least Squares Algorithm. A potential drawback of the resulting procedure is that it may produce
two different versions of the stimulus space matrix. To obviate this possibility, a symmetry preserving algorithm is offered,
which can be monitored to produce non-negative weights as well.
This work was partially supported by the Royal Netherlands Academy of Arts and Sciences. 相似文献
2.
This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way
compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions
across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus
coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework
with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting
proportions of viewing time across television networks for an area sample of households. 相似文献
3.
The representation of three-way proximity data by single and multiple tree structure models 总被引:4,自引:4,他引:0
Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications. 相似文献
4.
GENFOLD2: A set of models and algorithms for the general UnFOLDing analysis of preference/dominance data 总被引:3,自引:3,他引:0
A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Caroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices. 相似文献
5.
John T. Daws 《Journal of Classification》1996,13(1):57-80
Free-sorting data are obtained when subjects are given a set of objects and are asked to divide them into subsets. Such data are usually reduced by counting for each pair of objects, how many subjects placed both of them into the same subset. The present study examines the utility of a group of additional statistics. the cooccurrences of sets of three objects. Because there are dependencies among the pair and triple cooccurrences, adjusted triple similarity statistics are developed. Multidimensional scaling and cluster analysis — which usually use pair similarities as their input data — can be modified to operate on three-way similarities to create representations of the set of objects. Such methods are applied to a set of empirical sorting data: Rosenberg and Kim's (1975) fifteen kinship terms.The author thanks Phipps Arabie, Lawrence Hubert, Lawrence Jones, Ed Shoben, and Stanley Wasserman for their considerable contributions to this paper. 相似文献
6.
In this paper two alternative loss criteria for the least squares Procrustes problem are studied. These alternative criteria
are based on the Huber function and on the more radical biweight function, which are designed to be resistant to outliers.
Using iterative majorization it is shown how a convergent reweighted least squares algorithm can be developed. In asimulation
study it turns out that the proposed methods perform well over a specific range of contamination. When a uniform dilation
factor is included, mixed results are obtained. The methods also yield a set of weights that can be used for diagnostic purposes. 相似文献
7.
Jacqueline J. Meulman 《Journal of Classification》1996,13(2):249-266
An approach is presented for analyzing a heterogeneous set of categorical variables assumed to form a limited number of homogeneous subsets. The variables generate a particular set of proximities between the objects in the data matrix, and the objective of the analysis is to represent the objects in lowdimensional Euclidean spaces, where the distances approximate these proximities. A least squares loss function is minimized that involves three major components: a) the partitioning of the heterogeneous variables into homogeneous subsets; b) the optimal quantification of the categories of the variables, and c) the representation of the objects through multiple multidimensional scaling tasks performed simultaneously. An important aspect from an algorithmic point of view is in the use of majorization. The use of the procedure is demonstrated by a typical example of possible application, i.e., the analysis of categorical data obtained in a free-sort task. The results of points of view analysis are contrasted with a standard homogeneity analysis, and the stability is studied through a Jackknife analysis. 相似文献
8.
A modified CANDECOMP algorithm is presented for fitting the metric version of the Extended INDSCAL model to three-way proximity
data. The Extended INDSCAL model assumes, in addition to the common dimensions, a unique dimension for each object. The modified
CANDECOMP algorithm fits the Extended INDSCAL model in a dimension-wise fashion and ensures that the subject weights for the
common and the unique dimensions are nonnegative. A Monte Carlo study is reported to illustrate that the method is fairly
insensitive to the choice of the initial parameter estimates. A second Monte Carlo study shows that the method is able to
recover an underlying Extended INDSCAL structure if present in the data. Finally, the method is applied for illustrative purposes
to some empirical data on pain relievers. In the final section, some other possible uses of the new method are discussed.
Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijik Onderzoek”. 相似文献
9.
Bruno Falissard 《Journal of Classification》1996,13(2):267-280
It is common practice to perform a principal component analysis (PCA) on a correlation matrix to represent graphically the relations among numerous variables. In such a situation, the variables may be considered as points on the unit hypersphere of an Euclidean space, and PCA provides a sort of best fit of these points within a subspace. Taking into account their particular position, this paper suggests to represent the variables on an optimal three-dimensional unit sphere.
Résumé Il est classique d'utiliser une analyse en composantes principales pour représenter graphiquement une matrice de corrélation. Dans une telle situation, les variables peuvent être considérées comme des points sur l'hypersphère unité d'un espace Euclidien, et l'analyse en composantes principales permet d'obtenir une bonne approximation de ces points à l'aide d'un sous-espace Euclidien. Prenant en compte une telle situation géométrique, le présent article suggère de représenter les variables sur une sphère tri-dimensionelle optimale.相似文献
10.
W. J. Krzanowski 《Journal of Classification》1994,11(2):195-207
A low-dimensional representation of multivariate data is often sought when the individuals belong to a set ofa-priori groups and the objective is to highlight between-group variation relative to that within groups. If all the data are continuous then this objective can be achieved by means of canonical variate analysis, but no corresponding technique exists when the data are categorical or mixed continuous and categorical. On the other hand, if there is noa-priori grouping of the individuals, then ordination of any form of data can be achieved by use of metric scaling (principal coordinate analysis). In this paper we consider a simple extension of the latter approach to incorporate grouped data, and discuss to what extent this method can be viewed as a generalization of canonical variate analysis. Some illustrative examples are also provided. 相似文献
11.
Andrew R. Webb 《Journal of Classification》1997,14(2):249-267
This paper considers the use of radial basis functions for exploratory data analysis. These are used to model a transformation
from a high-dimensional observation space to a low-dimensional one. The parameters of the model are determined by optimising
a loss function defined to be the stress function in multidimensional scaling. The metric for the low-dimensional space is
taken to be the Minkowski metric with order parameter 1<-p<-2. A scheme based on iterative majorisation is proposed. 相似文献
12.
A sequential fitting procedure for linear data analysis models 总被引:1,自引:1,他引:0
Boris G. Mirkin 《Journal of Classification》1990,7(2):167-195
A particular factor analysis model with parameter constraints is generalized to include classification problems definable within a framework of fitting linear models. The sequential fitting (SEFIT) approach of principal component analysis is extended to include several nonstandard data analysis and classification tasks. SEFIT methods attempt to explain the variability in the initial data (commonly defined by a sum of squares) through an additive decomposition attributable to the various terms in the model. New methods are developed for both traditional and fuzzy clustering that have useful theoretic and computational properties (principal cluster analysis, additive clustering, and so on). Connections to several known classification strategies are also stated.The author is grateful to P. Arabie and L. J. Hubert for editorial assistance and reviewing going well beyond traditional levels. 相似文献
13.
Dendrograms are widely used to represent graphically the clusters and partitions obtained with hierarchical clustering schemes. Espaliers are generalized dendrograms in which the length of horizontal lines is used in addition to their level in order to display the values of two characteristics of each cluster (e.g., the split and the diameter) instead of only one. An algorithm is first presented to transform a dendrogram into an espalier without rotation of any part of the former. This is done by stretching some of the horizontal lines to obtain a diagram with vertical and horizontal lines only, the cutting off by diagonal lines the parts of the horizontal lines exceeding their prescribed length. The problem of finding if, allowing rotations, no diagonal lines are needed is solved by anO(N
2) algorithm whereN is the number of entities to be classified. This algorithm is the generalized to obtain espaliers with minimum width and, possibly, some diagonal lines.Work of the first and second authors has been supported by FCAR (Fonds pour la Formation de Chercheurs et l'Aide à la Recherche) grant 92EQ1048, and grant N00014-92-J-1194 from the Office of Naval Research. Work of the first author has also been supported by NSERC (Natural Sciences and Engineering Research Council of Canada) grant to École des Hautes Études Commerciales, Montréal and by NSERC grant GP0105574. Work of the second author has been supported by NSERC grant GP0036426, by FCAR grant 90NC0305, and by an NSF Professorship for Women in Science at Princeton University from September 1990 until December 1991. Work of the third author was done in part during a visit to GERAD, Montréal. 相似文献
14.
Geert De Soete 《Journal of Classification》1984,1(1):235-242
The least squares algorithm for fitting ultrametric trees to proximity data originally proposed by Carroll and Pruzansky and further elaborated by De Soete is extended to handle missing data. A Monte Carlo evaluation reveals that the algorithm is capable of recovering an ultrametric tree underlying an incomplete set of error-perturbed dissimilarities quite well.Geert De Soete is Aangesteld Navorser of the Belgian National Fonds voor Wetenschappelijk Onderzoek. 相似文献
15.
In this paper we develop a version of the Jackknife which seems especially suited for Multidimensional Scaling. It deletes one stimulus at a time, and combines the resulting solutions by a least squares matching method. The results can be used for stability analysis, and for purposes of cross validation. 相似文献
16.
Two algorithms for fitting directed graphs to nonsymmetric proximity data are compared. The first approach, termed MAPNET,
is a direct extension of a mathematical programming procedure for fitting undirected graphs to symmetric proximity data presented
by Klauer and Carroll (1989). For a user-specified number of links, the algorithm seeks to provide the connected network that
gives the least-squares approximation of the proximity data with the specified number of links, allowing for linear transformations
of the data. The mathematical programming approach is compared to the NETSCAL method for fitting directed graphs (Hutchinson
1989), using the Monte Carlo methods and data sets employed by Hutchinson. 相似文献
17.
We present an approach, independent of the common gradient-based necessary conditions for obtaining a (locally) optimal solution,
to multidimensional scaling using the city-block distance function, and implementable in either a metric or nonmetric context.
The difficulties encountered in relying on a gradient-based strategy are first reviewed: the general weakness in indicating
a good solution that is implied by the satisfaction of the necessary condition of a zero gradient, and the possibility of
actual nonconvergence of the associated optimization strategy. To avoid the dependence on gradients for guiding the optimization
technique, an alternative iterative procedure is proposed that incorporates (a) combinatorial optimization to construct good
object orders along the chosen number of dimensions and (b) nonnegative least-squares to re-estimate the coordinates for the
objects based on the object orders. The re-estimated coordinates are used to improve upon the given object orders, which may
in turn lead to better coordinates, and so on until convergence of the entire process occurs to a (locally) optimal solution.
The approach is illustrated through several data sets on the perception of similarity of rectangles and compared to the results
obtained with a gradient-based method. 相似文献
18.
Jan de Leeuw 《Journal of Classification》1988,5(2):163-180
In this paper we study the convergence properties of an important class of multidimensional scaling algorithms. We unify and extend earlier qualitative results on convergence, which tell us when the algorithms are convergent. In order to prove global convergence results we use the majorization method. We also derive, for the first time, some quantitative convergence theorems, which give information about the speed of convergence. It turns out that in almost all cases convergence is linear, with a convergence rate close to unity. This has the practical consequence that convergence will usually be very slow, and this makes techniques to speed up convergence very important. It is pointed out that step-size techniques will generally not succeed in producing marked improvements in this respect. 相似文献
19.
Spectral analysis of phylogenetic data 总被引:12,自引:0,他引:12
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences,
the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which
counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation
called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for
unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge
weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic
tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality
selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches
the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard
conjugation to allow a comparison with the original sequence spectrum.
A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding
spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance
and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa
and perhaps for up to 30 taxa.
The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted.
The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data
are new. 相似文献
20.
随着中国进入高铁时代,高速铁路既给国人带来出行方式的转变,也拉动了沿线经济的发展。一方面人们在享受着高速铁路带来的前所未有贴着大地飞翔的感觉,另一方面,高铁事故的发生也引发了国人对高铁技术飞速发展带来的危险性的充分关注。多维度视角下的中国高铁,催生了人们对高铁的多重思维。正如对任何一门高新技术的认识一样,高铁技术的发展所带来的"双刃剑"悖论,依然左右着人们的思维,并将直接影响人们对高铁的评价和未来高铁的建设。本文分析总结了人们对高铁的种种思维方式,并在总结高铁发展现状的基础上,预测了中国高铁的未来发展状态。 相似文献