首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rotation in Correspondence Analysis   总被引:1,自引:1,他引:0  
In correspondence analysis rows and columns of a nonnegative data matrix are depicted as points in a, usually, two-dimensional plot. Although such a two-dimensional plot often provides a reasonable approximation, the situation can occur that an approximation of higher dimensionality is required. This is especially the case when the data matrix is large. In such instances it may become difficult to interpret the solution. Similar to what is done in principal component analysis and factor analysis the correspondence analysis solution can be rotated to increase the interpretability. However, due to the various scaling options encountered in correspondence analysis, there are several alternative options for rotating the solutions. In this paper we consider two options for rotation in correspondence analysis. An example is provided so that the benefits of rotation become apparent.  相似文献   

2.
Starting from the problem of missing data in surveys with Likert-type scales, the aim of this paper is to evaluate a possible improvement for the imputation procedure proposed by Lavori, Dawson, and Shera (1995) here called Approximate Bayesian bootstrap with Propensity score (ABP). We propose an imputation procedure named Approximate Bayesian bootstrap with Propensity score and Nearest neighbour (ABPN), which, after the ??propensity score step?? of ABP, randomly selects a donor in the nonrespondent??s neighbourhood, which includes cases with response patterns similar to the one of the nonrespondent to be imputed. A preliminary simulation study with single imputation on missing data in two Likerttype scales from a real data set shows that ABPN: (a) performed better than the ABP imputation, and (b) can be considered as a serious competitor of other procedures used in this context.  相似文献   

3.
The analysis of a three-way data set using three-mode principal components analysis yields component matrices for all three modes of the data, and a three-way array called the core, which relates the components for the different modes to each other. To exploit rotational freedom in the model, one may rotate the core array (over all three modes) to an optimally simple form, for instance by three-mode orthomax rotation. However, such a rotation of the core may inadvertently detract from the simplicity of the component matrices. One remedy is to rotate the core only over those modes in which no simple solution for the component matrices is desired or available, but this approach may in turn reduce the simplicity of the core to an unacceptable extent. In the present paper, a general approach is developed, in which a criterion is optimized that not only takes into account the simplicity of the core, but also, to any desired degree, the simplicity of the component matrices. This method (in contrast to methods for either core or component matrix rotation) can be used to find solutions in which the core and the component matrices are all reasonably simple.  相似文献   

4.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.  相似文献   

5.
Unfolding creates configurations from preference information. In this paper, it is argued that not all preference information needs to be collected and that good solutions are still obtained, even when more than half of the data is missing. Simulation studies are conducted to compare missing data treatments, sources of missing data, and designs for the specification of missing data. Guidelines are provided and used in actual practice.  相似文献   

6.
In this paper two alternative loss criteria for the least squares Procrustes problem are studied. These alternative criteria are based on the Huber function and on the more radical biweight function, which are designed to be resistant to outliers. Using iterative majorization it is shown how a convergent reweighted least squares algorithm can be developed. In asimulation study it turns out that the proposed methods perform well over a specific range of contamination. When a uniform dilation factor is included, mixed results are obtained. The methods also yield a set of weights that can be used for diagnostic purposes.  相似文献   

7.
Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.  相似文献   

8.
A Thurstonian model for ranks is introduced in which rank-induced dependencies are specified through correlation coefficients among ranked objects that are determined by a vector of rank-induced parameters. The ranking model can be expressed in terms of univariate normal distribution functions, thus simplifying a previously computationally intensive problem. A theorem is proven that shows that the specification given in the paper for the dependencies is the only way that this simplification can be achieved under the process assumptions of the model. The model depends on certain conditional probabilities that arise from item orders considered by subjects as they make ranking decisions. Examples involving a complete set of ranks and a set with missing values are used to illustrate recovery of the objects’ scale values and the rank dependency parameters. Application of the model to ranks for gift items presented singly or as composite items is also discussed.  相似文献   

9.
Graphical representation of nonsymmetric relationships data has usually proceeded via separate displays for the symmetric and the skew-symmetric parts of a data matrix. DEDICOM avoids splitting the data into symmetric and skewsymmetric parts, but lacks a graphical representation of the results. Chino's GIPSCAL combines features of both models, but may have a poor goodness-of-fit compared to DEDICOM. We simplify and generalize Chino's method in such a way that it fits the data better. We develop an alternating least squares algorithm for the resulting method, called Generalized GIPSCAL, and adjust it to handle GIPSCAL as well. In addition, we show that Generalized GIPSCAL is a constrained variant of DEDICOM and derive necessary and sufficient conditions for equivalence of the two models. Because these conditions are rather mild, we expect that in many practical cases DEDICOM and Generalized GIPSCAL are (nearly) equivalent, and hence that the graphical representation from Generalized GIPSCAL can be used to display the DEDICOM results graphically. Such a representation is given for an illustration. Finally, we show Generalized GIPSCAL to be a generalization of another method for joint representation of the symmetric and skew-symmetric parts of a data matrix.This research has been made possible by a fellowship from the Royal Netherlands Academy of Arts and Sciences to the first author, and by research grant number A6394 to the second author, from the Natural Sciences and Engineering Research Council of Canada. The authors are obliged to Jos ten Berge and Naohito Chino for stimulating comments.  相似文献   

10.
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS.  相似文献   

11.
A validation study of a variable weighting algorithm for cluster analysis   总被引:1,自引:0,他引:1  
De Soete (1986, 1988) proposed a variable weighting procedure when Euclidean distance is used as the dissimilarity measure with an ultrametric hierarchical clustering method. The algorithm produces weighted distances which approximate ultrametric distances as closely as possible in a least squares sense. The present simulation study examined the effectiveness of the De Soete procedure for an applications problem for which it was not originally intended. That is, to determine whether or not the algorithm can be used to reduce the influence of variables which are irrelevant to the clustering present in the data. The simulation study examined the ability of the procedure to recover a variety of known underlying cluster structures. The results indicate that the algorithm is effective in identifying extraneous variables which do not contribute information about the true cluster structure. Weights near 0.0 were typically assigned to such extraneous variables. Furthermore, the variable weighting procedure was not adversely effected by the presence of other forms of error in the data. In general, it is recommended that the variable weighting procedure be used for applied analyses when Euclidean distance is employed with ultrametric hierarchical clustering methods.  相似文献   

12.
Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set.  相似文献   

13.
The objective of this paper is to develop the maximum likelihood approach for analyzing a finite mixture of structural equation models with missing data that are missing at random. A Monte Carlo EM algorithm is proposed for obtaining the maximum likelihood estimates. A well-known statistic in model comparison, namely the Bayesian Information Criterion (BIC), is used for model comparison. With the presence of missing data, the computation of the observed-data likelihood function value involved in the BIC is not straightforward. A procedure based on path sampling is developed to compute this function value. It is shown by means of simulation studies that ignoring the incomplete data with missing entries gives less accurate ML estimates. An illustrative real example is also presented.  相似文献   

14.
The aim of this paper is to analyze two scaling extensions of the Orthogonal Procrustes Problem (OPP) called the pre-scaling and the post-scaling approaches. We also discuss some problems related to these extensions and propose two new algorithms to find optimal solutions. These algorithms, which are based on the majorization principle, are shown to be monotonically convergent and their performance is examined.  相似文献   

15.
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology. The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional relevant literature to our attention, significantly improved our article.  相似文献   

16.
In many statistical applications data are curves measured as functions of a continuous parameter as time. Despite of their functional nature and due to discrete-time observation, these type of data are usually analyzed with multivariate statistical methods that do not take into account the high correlation between observations of a single curve at nearby time points. Functional data analysis methodologies have been developed to solve these type of problems. In order to predict the class membership (multi-category response variable) associated to an observed curve (functional data), a functional generalized logit model is proposed. Base-line category logit formulations will be considered and their estimation based on basis expansions of the sample curves of the functional predictor and parameters. Functional principal component analysis will be used to get an accurate estimation of the functional parameters and to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental study with simulated and real data.  相似文献   

17.
An EM algorithm for fitting mixtures of autoregressions of low order is constructed and the properties of the estimators are explored on simulated and real datasets. The mixture model incorporates a component with an improper density, which is intended for outliers. The model is proposed as an alternative to the search for the order of a single-component autoregression. The methods can be adapted to other patterns of dependence in panel data. An application to the monthly records of income of the outlets of a retail company is presented.  相似文献   

18.
In supervised learning, an important issue usually not taken into account by classical methods is that a class represented in the test set may have not been encountered earlier in the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a model-based discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which can detect several unobserved groups of points and can adapt the learned classifier to the new situation. Two EM-based procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real-world problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis.  相似文献   

19.
Mokken scale analysis uses an automated bottom-up stepwise item selection procedure that suffers from two problems. First, when selected during the procedure items satisfy the scaling conditions but they may fail to do so after the scale has been completed. Second, the procedure is approximate and thus may not produce the optimal item partitioning. This study investigates a variation on Mokken’s item selection procedure, which alleviates the first problem, and proposes a genetic algorithm, which alleviates both problems. The genetic algorithm is an approximation to checking all possible partitionings. A simulation study shows that the genetic algorithm leads to better scaling results than the other two procedures.  相似文献   

20.
Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号