首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Several methods have recently been introduced for investigating relations between three interpoint proximity matricesA, B, C, each of which furnishes a different type of distance between the same objects. Smouse, Long, and Sokal (1986) investigate the partial correlation betweenA andB conditional onC. Dow and Cheverud (1985) ask whethercorr (A, C), equalscorr (B, C). Manly (1986) investigates regression-like models for predicting one matrix as a function of others. We have investigated rejection rates of these methods when their null hypotheses are true, but data are spatially autocorrelated (SA). That is,A, andB are distance matrices from independent realizations of the same SA generating process, andC is a matrix of geographic connections. SA causes all the models to be liberal because the hypothesis of equally likely row/column permutations invoked, by all these methods, is untrue when data are SA. Consequently, we cannot unreservedly recommend the use of any of these methods with SA data. However, if SA is weak, the Smouse-Long-Sokal method, used with a conservative critical value, is unlikely to reject falsely.  相似文献   

3.
The analysis of a three-way data set using three-mode principal components analysis yields component matrices for all three modes of the data, and a three-way array called the core, which relates the components for the different modes to each other. To exploit rotational freedom in the model, one may rotate the core array (over all three modes) to an optimally simple form, for instance by three-mode orthomax rotation. However, such a rotation of the core may inadvertently detract from the simplicity of the component matrices. One remedy is to rotate the core only over those modes in which no simple solution for the component matrices is desired or available, but this approach may in turn reduce the simplicity of the core to an unacceptable extent. In the present paper, a general approach is developed, in which a criterion is optimized that not only takes into account the simplicity of the core, but also, to any desired degree, the simplicity of the component matrices. This method (in contrast to methods for either core or component matrix rotation) can be used to find solutions in which the core and the component matrices are all reasonably simple.  相似文献   

4.
A natural extension of classical metric multidimensional scaling is proposed. The result is a new formulation of nonmetric multidimensional scaling in which the strain criterion is minimized subject to order constraints on the disparity variables. Innovative features of the new formulation include: the parametrization of the p-dimensional distance matrices by the positive semidefinite matrices of rank ≤p; optimization of the (squared) disparity variables, rather than the configuration coordinate variables; and a new nondegeneracy constraint, which restricts the set of (squared) disparities rather than the set of distances. Solutions are obtained using an easily implemented gradient projection method for numerical optimization. The method is applied to two published data sets.  相似文献   

5.
Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set.  相似文献   

6.
The use of Candecomp to fit scalar products in the context of Indscal is based on the assumption that, due to the symmetry of the data matrices involved, two components matrices will become equal when Candecomp converges. Bennani Dosse and Ten Berge (2008) have shown that, in the single component case, the assumption can only be violated at saddle points in the case of Gramian matrices. This paper again considers Candecomp applied to symmetric matrices, but with an orthonormality constraint on the components. This constrained version of Candecomp, when applied to symmetric matrices, has long been known under the acronym Indort. When the data matrices are positive definite, or have become positive semidefinite due to double centering, and the saliences are nonnegative – by chance or by constraint –, the component matrices resulting from Indort are shown to be equal. Because Indort is also free from so-called degeneracy problems, it is a highly attractive alternative to Candecomp in the present context. We also consider a well-known successive approach to the orthogonally constrained Indscal problem and we compare, from simulated and real data sets, its results with those given by the simultaneous (Indort) approach.  相似文献   

7.
Graphical displays which show inter-sample distances are important for the interpretation and presentation of multivariate data. Except when the displays are two-dimensional, however, they are often difficult to visualize as a whole. A device, based on multidimensional unfolding, is described for presenting some intrinsically high-dimensional displays in fewer, usually two, dimensions. This goal is achieved by representing each sample by a pair of points, sayR i andr i, so that a theoretical distance between thei-th andj-th samples is represented twice, once by the distance betweenR i andr j and once by the distance betweenR j andr i. Selfdistances betweenR i andr i need not be zero. The mathematical conditions for unfolding to exhibit symmetry are established. Algorithms for finding approximate fits, not constrained to be symmetric, are discussed and some examples are given.  相似文献   

8.
Lattice theory is used to develop techniques for classifying groups of subjects on the basis of their recall strategies or multiple recall strategies within individual subjects. Using the ordered tree algorithm to represent sets of recall orders, it is shown how both trees and single recall strings can be represented as points within a nonsemimodular, graded lattice. Distances within the lattice structure are used to construct a dissimilarity measure,S, which can then be used to partition the individual recall strings. The measureS between strings is compared to Kendall's tau in three empirical tests, examining differences between individual subjects, differences between groups of subjects, and differences within a subject. It was shown that onlyS could recover the original differences. Differences between comparing chunks versus comparing orders are discussed.The author would like to thank Henry Rueter, Judith Olson, John Jonides, and James Jaccard for many inspiring comments during several stages of this project, two anonymous reviewers for several important insights, and Malhee Lee for her assistance with data collection. This work was supported by NIMH Grant MH 39912. Portions of this work were presented at the annual meeting of the Classification Society in St. John's, Newfoundland, July 1985, and the annual meeting of the Society for Mathematical Psychology in Boston, MA, August 1986.  相似文献   

9.
n-Way Metrics     
We study a family of n-way metrics that generalize the usual two-way metric. The n-way metrics are totally symmetric maps from E n into \mathbbR \geqslant 0 {\mathbb{R}_{ \geqslant 0}} . The three-way metrics introduced by Joly and Le Calvé (1995) and Heiser and Bennani (1997) and the n-way metrics studied in Deza and Rosenberg (2000) belong to this family. It is shown how the n-way metrics and n-way distance measures are related to (n − 1)-way metrics, respectively, (n − 1)-way distance measures.  相似文献   

10.
Canonical Variate Analysis (CVA) is one of the most useful of multivariate methods. It is concerned with separating between and within group variation among N samples from K populations with respect to p measured variables. Mahalanobis distance between the K group means can be represented as points in a (K - 1) dimensional space and approximated in a smaller space, with the variables shown as calibrated biplot axes. Within group variation may also be shown, together with circular confidence regions and other convex prediction regions, which may be used to discriminate new samples. This type of representation extends to what we term Analysis of Distance (AoD), whenever a Euclidean inter-sample distance is defined. Although the N × N distance matrix of the samples, which may be large, is required, eigenvalue calculations are needed only for the much smaller K × K matrix of distances between group centroids. All the ancillary information that is attached to a CVA analysis is available in an AoD analysis. We outline the theory and the R programs we developed to implement AoD by presenting two examples.  相似文献   

11.
ADditive CLUStering (ADCLUS) is a tool for overlapping clustering of two-way proximity matrices (objects?×?objects). In Simple Additive Fuzzy Clustering (SAFC), a variant of ADCLUS is introduced providing a fuzzy partition of the objects, that is the objects belong to the clusters with the so-called membership degrees ranging from zero (complete non-membership) to one (complete membership). INDCLUS (INdividual Differences CLUStering) is a generalization of ADCLUS for handling three-way proximity arrays (objects?×?objects?×?subjects). Here, we propose a fuzzified alternative to INDCLUS capable to offer a fuzzy partition of the objects by generalizing in a three-way context the idea behind SAFC. This new model is called Fuzzy INdividual Differences CLUStering (FINDCLUS). An algorithm is provided for fitting the FINDCLUS model to the data. Finally, the results of a simulation experiment and some applications to synthetic and real data are discussed.  相似文献   

12.
Efficient algorithms for agglomerative hierarchical clustering methods   总被引:11,自引:4,他引:7  
Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n 3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n 3) worst-case time, can reasonably be expected to exhibit 0(n 2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n 2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n 2) time and space.Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n 2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

13.
最小编辑距离是比较语言中不同符号串之间相似程度的一种方法,这种方法计算不同符号串之间转换时的删除、插入、替代等运算的操作数,通过动态规划算法进行算法描述。在术语研究中,可以使用最小编辑距离对术语特征进行定量化计算。在计算语言学中,可以使用最小编辑距离发现潜在的拼写错误,进行错拼更正。在语音识别中,可以使用最小编辑距离计算单词的错误率。在机器翻译中,可以使用最小编辑距离进行双语语料库的单词对齐。  相似文献   

14.
The set of k points that optimally represent a distribution in terms of mean squared error have been called principal points (Flury 1990). Principal points are a special case of self-consistent points. Any given set of k distinct points in R p induce a partition of R p into Voronoi regions or domains of attraction according to minimal distance. A set of k points are called self-consistent for a distribution if each point equals the conditional mean of the distribution over its respective Voronoi region. For symmetric multivariate distributions, sets of self-consistent points typically form symmetric patterns. This paper investigates the optimality of different symmetric patterns of self-consistent points for symmetric multivariate distributions and in particular for the bivariate normal distribution. These results are applied to the problem of estimating principal points.  相似文献   

15.
The character and OTU stability of classifications based on UPGMA clustering and maximum parsimony (MP) trees were compared for 5 datasets (families of angiosperms, families of orthopteroid insects, species of the fish genusIctalurus, genera of the salamander family Salamandridae, and genera of the frog family Myobatrachidae). Stability was investigated by taking different sized random subsamples of OTUs or characters, computing UPGMA clusters and an MP tree, and then comparing the resulting trees with those based on the entire dataset. Agreement was measured by two consensus indices, that of Colless, computed from strict consensus trees, and Stinebrickner's 0.5-consensus index. Tests of character stability generally showed a monotone decrease in agreement with the standard as smaller sets of characters are considered. The relative success of the two methods depended upon the dataset. Tests of OTU stability showed a monotone decrease in agreement for UPGMA as smaller sets of OTUs are considered. But for MP, agreement decreased and then increased again on the same scale. The apparent superiority of UPGMA relative to MP with respect to OTU stability depended upon the dataset. Considerations other than stability, such as computer efficiency or accuracy, will also determine the method of choice for classifications.  相似文献   

16.
In this paper we discuss two approaches to the axiomatization of scientific theories in the context of the so called semantic approach, according to which (roughly) a theory can be seen as a class of models. The two approaches are associated respectively to Suppes’ and to da Costa and Chuaqui’s works. We argue that theories can be developed both in a way more akin to the usual mathematical practice (Suppes), in an informal set theoretical environment, writing the set theoretical predicate in the language of set theory itself or, more rigorously (da Costa and Chuaqui), by employing formal languages that help us in writing the postulates to define a class of structures. Both approaches are called internal, for we work within a mathematical framework, here taken to be first-order ZFC. We contrast these approaches with an external one, here discussed briefly. We argue that each one has its strong and weak points, whose discussion is relevant for the philosophical foundations of science.  相似文献   

17.
Trees, and particularly binary trees, appear frequently in the classification literature. When studying the properties of the procedures that fit trees to sets of data, direct analysis can be too difficult, and Monte Carlo simulations may be necessary, requiring the implementation of algorithms for the generation of certain families of trees at random. In the present paper we use the properties of Prufer's enumeration of the set of completely labeled trees to obtain algorithms for the generation of completely labeled, as well as terminally labeled t-ary (and in particular binary) trees at random, i.e., with uniform distribution. Actually, these algorithms are general in that they can be used to generate random trees from any family that can be characterized in terms of the node degrees. The algorithms presented here are as fast as (in the case of terminally labeled trees) or faster than (in the case of completely labeled trees) any other existing procedure, and the memory requirements are minimal. Another advantage over existing algorithms is that there is no need to store pre-calculated tables.  相似文献   

18.
Proportional link linkage (PLL) clustering methods are a parametric family of monotone invariant agglomerative hierarchical clustering methods. This family includes the single, minimedian, and complete linkage clustering methods as special cases; its members are used in psychological and ecological applications. Since the literature on clustering space distortion is oriented to quantitative input data, we adapt its basic concepts to input data with only ordinal significance and analyze the space distortion properties of PLL methods. To enable PLL methods to be used when the numbern of objects being clustered is large, we describe an efficient PLL algorithm that operates inO(n 2 logn) time andO(n 2) space.This work was partially supported by the Natural Sciences and Engineering Research Council of Canada and by the Austrian Fonds zur Förderung der wissenschaftlichen Forschung.  相似文献   

19.
Generation of Random Clusters with Specified Degree of Separation   总被引:1,自引:1,他引:0  
We propose a random cluster generation algorithm that has the desired features: (1) the population degree of separation between clusters and the nearest neighboring clusters can be set to a specified value, based on a separation index; (2) no constraint is imposed on the isolation among clusters in each dimension; (3) the covariance matrices correspond to different shapes, diameters and orientations; (4) the full cluster structures generally could not be detected simply from pair-wise scatterplots of variables; (5) noisy variables and outliers can be imposed to make the cluster structures harder to be recovered. This algorithm is an improvement on the method used in Milligan (1985).  相似文献   

20.
Models for the representation of proximity data (similarities/dissimilarities) can be categorized into one of three groups of models: continuous spatial models, discrete nonspatial models, and hybrid models (which combine aspects of both spatial and discrete models). Multidimensional scaling models and associated methods, used for thespatial representation of such proximity data, have been devised to accommodate two, three, and higher-way arrays. At least one model/method for overlapping (but generally non-hierarchical) clustering called INDCLUS (Carroll and Arabie 1983) has been devised for the case of three-way arrays of proximity data. Tree-fitting methods, used for thediscrete network representation of such proximity data, have only thus far been devised to handle two-way arrays. This paper develops a new methodology called INDTREES (for INdividual Differences in TREE Structures) for fitting various(discrete) tree structures to three-way proximity data. This individual differences generalization is one in which different individuals, for example, are assumed to base their judgments on the same family of trees, but are allowed to have different node heights and/or branch lengths.We initially present an introductory overview focussing on existing two-way models. The INDTREES model and algorithm are then described in detail. Monte Carlo results for the INDTREES fitting of four different three-way data sets are presented. In the application, a single ultrametric tree is fitted to three-way proximity data derived from intention-to-buy-data for various brands of over-the-counter pain relievers for relieving three common types of maladies. Finally, we briefly describe how the INDTREES procedure can be extended to accommodate hybrid modelling, as well as to handle other types of applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号