共查询到20条相似文献,搜索用时 514 毫秒
1.
In compositional data analysis, an observation is a vector containing nonnegative values, only the relative sizes of which are considered to be of interest. Without loss of generality, a compositional vector can be taken to be a vector of proportions that sum to one. Data of this type arise in many areas including geology, archaeology, biology, economics and political science. In this paper we investigate methods for classification of compositional data. Our approach centers on the idea of using the α-transformation to transform the data and then to classify the transformed data via regularized discriminant analysis and the k-nearest neighbors algorithm. Using the α-transformation generalizes two rival approaches in compositional data analysis, one (when α=1) that treats the data as though they were Euclidean, ignoring the compositional constraint, and another (when α = 0) that employs Aitchison’s centered log-ratio transformation. A numerical study with several real datasets shows that whether using α = 1 or α = 0 gives better classification performance depends on the dataset, and moreover that using an intermediate value of α can sometimes give better performance than using either 1 or 0. 相似文献
2.
A latent class vector model for preference ratings 总被引:1,自引:1,他引:1
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as
input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and
constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The
distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that
are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm
for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes
and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and
some possible extensions are discussed.
Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.” 相似文献
3.
We consider correspondence analysis (CA) and taxicab correspondence analysis (TCA) of relational datasets that can mathematically be described as weighted loopless graphs. Such data appear in particular in network analysis. We present CA and TCA as relaxation methods for the graph partitioning problem. Examples of real datasets are provided. 相似文献
4.
Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data 总被引:1,自引:0,他引:1
Javier Palarea-Albaladejo Josep Antoni Martín-Fernández Jesús A. Soto 《Journal of Classification》2012,29(2):144-169
Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements??compositions??are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature. 相似文献
5.
A comparison between two distance-based discriminant principles 总被引:1,自引:1,他引:0
W. J. Krzanowski 《Journal of Classification》1987,4(1):73-84
A distance-based classification procedure suggested by Matusita (1956) has long been available as an alternative to the usual Bayes decision rule. Unsatisfactory features of both approaches when applied to multinomial data led Goldstein and Dillon (1978) to propose a new distance-based principle for classification. We subject the Goldstein/Dillon principle to some theoretical scrutiny by deriving the population classification rules appropriate not only to multinomial data but also to multivariate normal and mixed multinomial/multinormal data. These rules demonstrate equivalence of the Goldstein/Dillon and Matusita approaches for the first two data types, and similar equivalence is conjectured (but not explicitly obtained) for the mixed data case. Implications for sample-based rules are noted. 相似文献
6.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function
over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values
is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple
correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative
PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized
version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are
promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework. 相似文献
7.
GENFOLD2: A set of models and algorithms for the general UnFOLDing analysis of preference/dominance data 总被引:3,自引:3,他引:0
A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Caroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices. 相似文献
8.
In this paper we propose the concept of structural similarity as a relaxation of blockmodeling in social network analysis. Most previous approaches attempt to relax the constraints on partitions, for instance, that of being a structural or regular equivalence to being approximately structural or regular, respectively. In contrast, our approach is to relax the partitions themselves: structural similarities yield similarity values instead of equivalence or non-equivalence of actors, while strictly obeying the requirement made for exact regular equivalences. Structural similarities are based on a vector space interpretation and yield efficient spectral methods that, in a more restrictive manner, have been successfully applied to difficult combinatorial problems such as graph coloring. While traditional blockmodeling approaches have to rely on local search heuristics, our framework yields algorithms that are provably optimal for specific data-generation models. Furthermore, the stability of structural similarities can be well characterized making them suitable for the analysis of noisy or dynamically changing network data. 相似文献
9.
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, our approach leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots, we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots. 相似文献
10.
11.
This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way
compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions
across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus
coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework
with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting
proportions of viewing time across television networks for an area sample of households. 相似文献
12.
The Kohonen self-organizing map method: An assessment 总被引:1,自引:0,他引:1
The “self-organizing map” method, due to Kohonen, is a well-known neural network method. It is closely related to cluster
analysis (partitioning) and other methods of data analysis. In this article, we explore some of these close relationships.
A number of properties of the technique are discussed. Comparisons with various methods of data analysis (principal components
analysis, k-means clustering, and others) are presented.
This work has been partially supported for M. Hernández-Pajares by the DGCICIT of Spain under grant No. PB90-0478 and by a
CESCA-1993 computer-time grant. Fionn Murtagh is affiliated to the Astrophysics Division, Space Science Department, European
Space Agency. 相似文献
13.
Probabilistic D-Clustering 总被引:1,自引:1,他引:0
We present a new iterative method for probabilistic clustering of data. Given clusters, their centers and the distances of
data points from these centers, the probability of cluster membership at any point is assumed inversely proportional to the
distance from (the center of) the cluster in question. This assumption is our working principle.
The method is a generalization, to several centers, of theWeiszfeld method for solving the Fermat–Weber location problem.
At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points,
and the centers are updated as convex combinations of these points, with weights determined by the above principle. Computations
stop when the centers stop moving. 相似文献
14.
Rotation in Correspondence Analysis 总被引:1,自引:1,他引:0
In correspondence analysis rows and columns of a nonnegative data matrix are
depicted as points in a, usually, two-dimensional plot. Although such a two-dimensional
plot often provides a reasonable approximation, the situation can occur that an approximation
of higher dimensionality is required. This is especially the case when the data
matrix is large. In such instances it may become difficult to interpret the solution. Similar
to what is done in principal component analysis and factor analysis the correspondence
analysis solution can be rotated to increase the interpretability. However, due to the various
scaling options encountered in correspondence analysis, there are several alternative
options for rotating the solutions. In this paper we consider two options for rotation in
correspondence analysis. An example is provided so that the benefits of rotation become
apparent. 相似文献
15.
该文通过对几部朝鲜古代历法著作的研究,对古代朝鲜学者对《授时历》的消化吸收情况进行了探讨,结果发现:《授时历捷法立成》则是高丽天文学家姜保根据《授时历》独立推算的一套立成表,但在使用上比《授时历立成》本身的立成表更为方便。《七政算·内篇》在“应数”等基本常数方面虽然取自《授时历》,但在算法和体例方面则主要是以《大统历通轨》为参照的;该书中的四季半昼夜分和日出时刻表是李朝天文学家根据《授时历》“步九服所在漏刻术”推求的,该算法与球面天文学算式相符,为推算结果提供了精度保障。《交食推步法》中已经正确推出了《授时历》盈缩、迅疾立成表的一般计算公式,表明李朝早期的朝鲜天文学家已经掌握了招差术以及《授时历》立法原理,对这部历法已经真正达到了融会贯通的水平。 相似文献
16.
Olivier Gascuel 《Journal of Classification》2000,17(1):67-99
O (n
4), where n is the number of objects. We describe the application of the MVR method to two data models: the weighted least-squares (WLS)
model (V is diagonal), where the MVR method can be reduced to an O(n
3) time complexity; a model arising from the study of biological sequences, which involves a complex non-diagonal V matrix
that is estimated from the dissimilarity matrix Δ. For both models, we provide simulation results that show a significant
error reduction in the reconstruction of T, relative to classical agglomerative algorithms. 相似文献
17.
本文简单评析了传统真理论的争论:符合论、融贯论和实用论,并引入了现代理论框架:弗雷格的“透明论旨”、塔尔斯基的“真理图式”以及蒯因的“去引号论”。霍维奇的最小主义主张真理的等值图式,并将真理概念限定为语义功能,从而提出真理是形而上学琐屑的。笔者质疑了最小主义将命题作为真理载体的做法,认为真理概念具有形而上学中立性而非形而上学琐屑性。并尝试突破真理的现代框架,提出完整的真理论应该包括“真”和“理”。 相似文献
18.
Spectral analysis of phylogenetic data 总被引:12,自引:0,他引:12
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences,
the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which
counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation
called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for
unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge
weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic
tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality
selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches
the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard
conjugation to allow a comparison with the original sequence spectrum.
A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding
spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance
and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa
and perhaps for up to 30 taxa.
The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted.
The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data
are new. 相似文献
19.
A method is presented for the graphic display of proximity matrices as a complement to the common data analysis techniques of hierarchical clustering. The procedure involves the use of computer generated shaded matrices based on unclassed choropleth mapping in conjunction with a strategy for matrix reorganization. The latter incorporates a combination of techniques for seriation and the ordering of binary trees.Partial support for this research was provided by NIJ Grant #82-IJ-CX-0019 and NSF Grant #SES82-06067. The authors wish to acknowledge the assistance of Professors L.J. Hubert, R.G. Golledge, and W.R. Tobler. 相似文献
20.
科技术语翻译是科技信息的沟通,也是文化元素的传递。术语概念形成机制和语言表达式都要受到文化认知的制约和影响。从等值传递专业概念的目标出发,术语翻译应当从文化认知原理、文化对应性等方面深入理解,通过对比分析,探讨术语翻译的实用方法和技巧。 相似文献