首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The analysis of a three-way data set using three-mode principal components analysis yields component matrices for all three modes of the data, and a three-way array called the core, which relates the components for the different modes to each other. To exploit rotational freedom in the model, one may rotate the core array (over all three modes) to an optimally simple form, for instance by three-mode orthomax rotation. However, such a rotation of the core may inadvertently detract from the simplicity of the component matrices. One remedy is to rotate the core only over those modes in which no simple solution for the component matrices is desired or available, but this approach may in turn reduce the simplicity of the core to an unacceptable extent. In the present paper, a general approach is developed, in which a criterion is optimized that not only takes into account the simplicity of the core, but also, to any desired degree, the simplicity of the component matrices. This method (in contrast to methods for either core or component matrix rotation) can be used to find solutions in which the core and the component matrices are all reasonably simple.  相似文献   

2.
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, our approach leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots, we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.  相似文献   

3.
We consider correspondence analysis (CA) and taxicab correspondence analysis (TCA) of relational datasets that can mathematically be described as weighted loopless graphs. Such data appear in particular in network analysis. We present CA and TCA as relaxation methods for the graph partitioning problem. Examples of real datasets are provided.  相似文献   

4.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

5.
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS.  相似文献   

6.
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology. The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional relevant literature to our attention, significantly improved our article.  相似文献   

7.
一般认为,符合论与融贯论是相互竞争、无法共存的理论。对这种通常的理解,本文试图从真理定义与真理标准的区分出发来进行质疑。本文要论证,在定义与标准的区分下,符合论提供定义,而融贯论提供标准。为此,本文将讨论对符合论之核心概念事实的各种批评,以及融贯论者使融贯论成为真理定义的各种努力。文章最后要展现一种新的理解两者之关系的可能性。  相似文献   

8.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.  相似文献   

9.
上证50ETF期权的推出丰富了市场交易的对冲机制,拓展了金融衍生品术语研究的领域。本文对期权交易术语的实值期权、平值期权、虚值期权和Fibonacci期权数列进行了基于交易数据的计算研究。通过对七年来期权交易数据的实时跟踪,我们建立了期权交易数据库,并采用计算术语学方法对期权交易进行了计算研究。我们发现:(1)Fibonacci期权数列是非常重要的期权交易术语,其黄金分割点适用于期权交易的波动测算;(2)上证50ETF期权合约涉及到期权交易术语的标准化问题,其交易并不是孤立的市场行为,同时受到新加坡A50指数等相关指数波动的影响;(3)上证50ETF期权交易体系是开放的:新加坡A50指数对我国上证50指数有交易的前瞻效应;上证50指数作为上证50ETF跟踪标的决定上证50ETF走向;上证50ETF作为期权标准合约跟踪标的影响到具体的合约交易;(4)上证50ETF期权交易的核心术语是杠杆波动率;Delta的计算赋值代表了高杠杆特性,其值域扩大将导致Fibonacci期权数列的黄金分割点出现偏差。最后得出结论:上证50ETF期权交易的研究是数据驱动的研究;基于统计的计算语言学方法可对期权术语进行赋值,并有效辅助期权交易;期权的计算术语学研究将促进数据驱动的术语学发展。  相似文献   

10.
In two-class discriminant problems, objects are allocated to one of the two classes by means of threshold rules based on discriminant functions. In this paper we propose to examine the quality of a discriminant functiong in terms of its performance curve. This curve is the plot of the two misclassification probabilities as the thresholdt assumes various real values. The role of such performance curves in evaluating and ordering discriminant functions and solving discriminant problems is presented. In particular, it is shown that: (i) the convexity of such a curve is a sufficient condition for optimal use of the information contained in the data reduced byg, and (ii)g with non-convex performance curve should be corrected by an explicitly obtained transformation.  相似文献   

11.
We discuss the use of orthogonal wavelet transforms in preprocessing multivariate data for subsequent analysis, e.g., by clustering the dimensionality reduction. Wavelet transforms allow us to introduce multiresolution approximation, and multiscale nonparametric regression or smoothing, in a natural and integrated way into the data analysis. As will be explained in the first part of the paper, this approach is of greatest interest for multivariate data analysis when we use (i) datasets with ordered variables, e.g., time series, and (ii) object dimensionalities which are not too small, e.g., 16 and upwards. In the second part of the paper, a different type of wavelet decomposition is used. Applications illustrate the powerfulness of this new perspective on data analysis.  相似文献   

12.
We propose functional cluster analysis (FCA) for multidimensional functional data sets, utilizing orthonormalized Gaussian basis functions. An essential point in FCA is the use of orthonormal bases that yield the identity matrix for the integral of the product of any two bases. We construct orthonormalized Gaussian basis functions using Cholesky decomposition and derive a property of Cholesky decomposition with respect to Gram-Schmidt orthonormalization. The advantages of the functional clustering are that it can be applied to the data observed at different time points for each subject, and the functional structure behind the data can be captured by removing the measurement errors. Numerical experiments are conducted to investigate the effectiveness of the proposed method, as compared to conventional discrete cluster analysis. The proposed method is applied to three-dimensional (3D) protein structural data that determine the 3D arrangement of amino acids in individual protein.  相似文献   

13.
俄语是联合国工作语言之一,是俄罗斯等多个国家的官方语言.随着"一带一路"倡议的推进和全球化进程的加快,俄语文本数据成为有关组织管理决策的重要信息来源,俄语文本挖掘也因而成为重要的管理决策支持方法.然而,俄语文本挖掘方法研究目前还远未成熟,尤其是其关键基础——俄语文本词语提取的性能较低,阻碍着俄语文本建模的准确性.因此,...  相似文献   

14.
Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set.  相似文献   

15.
Graphical representation of nonsymmetric relationships data has usually proceeded via separate displays for the symmetric and the skew-symmetric parts of a data matrix. DEDICOM avoids splitting the data into symmetric and skewsymmetric parts, but lacks a graphical representation of the results. Chino's GIPSCAL combines features of both models, but may have a poor goodness-of-fit compared to DEDICOM. We simplify and generalize Chino's method in such a way that it fits the data better. We develop an alternating least squares algorithm for the resulting method, called Generalized GIPSCAL, and adjust it to handle GIPSCAL as well. In addition, we show that Generalized GIPSCAL is a constrained variant of DEDICOM and derive necessary and sufficient conditions for equivalence of the two models. Because these conditions are rather mild, we expect that in many practical cases DEDICOM and Generalized GIPSCAL are (nearly) equivalent, and hence that the graphical representation from Generalized GIPSCAL can be used to display the DEDICOM results graphically. Such a representation is given for an illustration. Finally, we show Generalized GIPSCAL to be a generalization of another method for joint representation of the symmetric and skew-symmetric parts of a data matrix.This research has been made possible by a fellowship from the Royal Netherlands Academy of Arts and Sciences to the first author, and by research grant number A6394 to the second author, from the Natural Sciences and Engineering Research Council of Canada. The authors are obliged to Jos ten Berge and Naohito Chino for stimulating comments.  相似文献   

16.
On Similarity Indices and Correction for Chance Agreement   总被引:1,自引:1,他引:0  
Similarity indices can be used to compare partitions (clusterings) of a data set. Many such indices were introduced in the literature over the years. We are showing that out of 28 indices we were able to track, there are 22 different ones. Even though their values differ for the same clusterings compared, after correcting for agreement attributed to chance only, their values become similar and some of them even become equivalent. Consequently, the problem of choice of the index to be used for comparing different clusterings becomes less important.  相似文献   

17.
In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.  相似文献   

18.
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches.  相似文献   

19.
在中医走向世界的时代背景下,中医对外传播的进程越来越快。中医翻译在中医药文化“走出去”中起着至关重要的作用,一部能够快速、准确查询中医词汇或术语的电子(在线)词典可以为中医翻译者提供一个便捷的工具,可以更好地推动中医药对外传播。文章对以层级对应技术为核心的中医汉英电子词典的研发进行探索研究,以期为中医电子词典的编纂乃至中医翻译事业的发展做出贡献。  相似文献   

20.
L2 -norm: (1) dynamic programming; (2) an iterative quadratic assignment improvement heuristic; (3) the Guttman update strategy as modified by Pliner's technique of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and Tse. The methods are all implemented through (freely downloadable) MATLAB m-files; their use is illustrated by a common data set carried throughout. For the computationally intensive dynamic programming formulation that can a globally optimal solution, several possible computational improvements are discussed and evaluated using (a) a transformation of a given m-function with the MATLAB Compiler into C code and compiling the latter; (b) rewriting an m-function and a mandatory MATLAB gateway directly in Fortran and compiling into a MATLAB callable file; (c) comparisons of the acceleration of raw m-files implemented under the most recent release of MATLAB Version 6.5 (and compared to the absence of such acceleration under the previous MATLAB Version 6.1). Finally, and in contrast to the combinatorial optimization task of identifying a best unidimensional scaling for a given proximity matrix, an approach is given for the confirmatory fitting of a given unidimensional scaling based only on a fixed object ordering, and to nonmetric unidensional scaling that incorporates an additional optimal monotonic transformation of the proximities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号