首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 514 毫秒
1.
In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0.  相似文献   

2.
A latent class vector model for preference ratings   总被引:1,自引:1,他引:1  
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and some possible extensions are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.”  相似文献   

3.
We consider correspondence analysis (CA) and taxicab correspondence analysis (TCA) of relational datasets that can mathematically be described as weighted loopless graphs. Such data appear in particular in network analysis. We present CA and TCA as relaxation methods for the graph partitioning problem. Examples of real datasets are provided.  相似文献   

4.
Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements??compositions??are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature.  相似文献   

5.
A comparison between two distance-based discriminant principles   总被引:1,自引:1,他引:0  
A distance-based classification procedure suggested by Matusita (1956) has long been available as an alternative to the usual Bayes decision rule. Unsatisfactory features of both approaches when applied to multinomial data led Goldstein and Dillon (1978) to propose a new distance-based principle for classification. We subject the Goldstein/Dillon principle to some theoretical scrutiny by deriving the population classification rules appropriate not only to multinomial data but also to multivariate normal and mixed multinomial/multinormal data. These rules demonstrate equivalence of the Goldstein/Dillon and Matusita approaches for the first two data types, and similar equivalence is conjectured (but not explicitly obtained) for the mixed data case. Implications for sample-based rules are noted.  相似文献   

6.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

7.
A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Caroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices.  相似文献   

8.
In this paper we propose the concept of structural similarity as a relaxation of blockmodeling in social network analysis. Most previous approaches attempt to relax the constraints on partitions, for instance, that of being a structural or regular equivalence to being approximately structural or regular, respectively. In contrast, our approach is to relax the partitions themselves: structural similarities yield similarity values instead of equivalence or non-equivalence of actors, while strictly obeying the requirement made for exact regular equivalences. Structural similarities are based on a vector space interpretation and yield efficient spectral methods that, in a more restrictive manner, have been successfully applied to difficult combinatorial problems such as graph coloring. While traditional blockmodeling approaches have to rely on local search heuristics, our framework yields algorithms that are provably optimal for specific data-generation models. Furthermore, the stability of structural similarities can be well characterized making them suitable for the analysis of noisy or dynamically changing network data.  相似文献   

9.
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, our approach leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots, we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.  相似文献   

10.
文化资源是体现一个国家文化实力的核心要素,也是国家文化及文化产业发展的基础和源头。我国对各类物质和非物质文化资源数字化工作的开展,为我们利用大数据分析等先进技术,加强对中华文化的充分认知和深入挖掘利用提供了前所未有的契机和条件。本文利用大数据分析等技术手段,对我国如何加强文化资源管理的总体思路、技术框架和有关对策措施提出了建议。  相似文献   

11.
This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting proportions of viewing time across television networks for an area sample of households.  相似文献   

12.
The Kohonen self-organizing map method: An assessment   总被引:1,自引:0,他引:1  
The “self-organizing map” method, due to Kohonen, is a well-known neural network method. It is closely related to cluster analysis (partitioning) and other methods of data analysis. In this article, we explore some of these close relationships. A number of properties of the technique are discussed. Comparisons with various methods of data analysis (principal components analysis, k-means clustering, and others) are presented. This work has been partially supported for M. Hernández-Pajares by the DGCICIT of Spain under grant No. PB90-0478 and by a CESCA-1993 computer-time grant. Fionn Murtagh is affiliated to the Astrophysics Division, Space Science Department, European Space Agency.  相似文献   

13.
Probabilistic D-Clustering   总被引:1,自引:1,他引:0  
We present a new iterative method for probabilistic clustering of data. Given clusters, their centers and the distances of data points from these centers, the probability of cluster membership at any point is assumed inversely proportional to the distance from (the center of) the cluster in question. This assumption is our working principle. The method is a generalization, to several centers, of theWeiszfeld method for solving the Fermat–Weber location problem. At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as convex combinations of these points, with weights determined by the above principle. Computations stop when the centers stop moving.  相似文献   

14.
Rotation in Correspondence Analysis   总被引:1,自引:1,他引:0  
In correspondence analysis rows and columns of a nonnegative data matrix are depicted as points in a, usually, two-dimensional plot. Although such a two-dimensional plot often provides a reasonable approximation, the situation can occur that an approximation of higher dimensionality is required. This is especially the case when the data matrix is large. In such instances it may become difficult to interpret the solution. Similar to what is done in principal component analysis and factor analysis the correspondence analysis solution can be rotated to increase the interpretability. However, due to the various scaling options encountered in correspondence analysis, there are several alternative options for rotating the solutions. In this paper we consider two options for rotation in correspondence analysis. An example is provided so that the benefits of rotation become apparent.  相似文献   

15.
该文通过对几部朝鲜古代历法著作的研究,对古代朝鲜学者对《授时历》的消化吸收情况进行了探讨,结果发现:《授时历捷法立成》则是高丽天文学家姜保根据《授时历》独立推算的一套立成表,但在使用上比《授时历立成》本身的立成表更为方便。《七政算·内篇》在“应数”等基本常数方面虽然取自《授时历》,但在算法和体例方面则主要是以《大统历通轨》为参照的;该书中的四季半昼夜分和日出时刻表是李朝天文学家根据《授时历》“步九服所在漏刻术”推求的,该算法与球面天文学算式相符,为推算结果提供了精度保障。《交食推步法》中已经正确推出了《授时历》盈缩、迅疾立成表的一般计算公式,表明李朝早期的朝鲜天文学家已经掌握了招差术以及《授时历》立法原理,对这部历法已经真正达到了融会贯通的水平。  相似文献   

16.
O (n 4), where n is the number of objects. We describe the application of the MVR method to two data models: the weighted least-squares (WLS) model (V is diagonal), where the MVR method can be reduced to an O(n 3) time complexity; a model arising from the study of biological sequences, which involves a complex non-diagonal V matrix that is estimated from the dissimilarity matrix Δ. For both models, we provide simulation results that show a significant error reduction in the reconstruction of T, relative to classical agglomerative algorithms.  相似文献   

17.
王巍 《自然辩证法研究》2004,20(2):34-38,56
本文简单评析了传统真理论的争论:符合论、融贯论和实用论,并引入了现代理论框架:弗雷格的“透明论旨”、塔尔斯基的“真理图式”以及蒯因的“去引号论”。霍维奇的最小主义主张真理的等值图式,并将真理概念限定为语义功能,从而提出真理是形而上学琐屑的。笔者质疑了最小主义将命题作为真理载体的做法,认为真理概念具有形而上学中立性而非形而上学琐屑性。并尝试突破真理的现代框架,提出完整的真理论应该包括“真”和“理”。  相似文献   

18.
Spectral analysis of phylogenetic data   总被引:12,自引:0,他引:12  
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences, the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum. A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa and perhaps for up to 30 taxa. The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted. The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data are new.  相似文献   

19.
A method is presented for the graphic display of proximity matrices as a complement to the common data analysis techniques of hierarchical clustering. The procedure involves the use of computer generated shaded matrices based on unclassed choropleth mapping in conjunction with a strategy for matrix reorganization. The latter incorporates a combination of techniques for seriation and the ordering of binary trees.Partial support for this research was provided by NIJ Grant #82-IJ-CX-0019 and NSF Grant #SES82-06067. The authors wish to acknowledge the assistance of Professors L.J. Hubert, R.G. Golledge, and W.R. Tobler.  相似文献   

20.
科技术语翻译是科技信息的沟通,也是文化元素的传递。术语概念形成机制和语言表达式都要受到文化认知的制约和影响。从等值传递专业概念的目标出发,术语翻译应当从文化认知原理、文化对应性等方面深入理解,通过对比分析,探讨术语翻译的实用方法和技巧。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号