共查询到20条相似文献,搜索用时 31 毫秒
1.
Henk A.L. Kiers 《Journal of Classification》1998,15(2):245-263
The analysis of a three-way data set using three-mode principal components analysis yields component matrices for all three
modes of the data, and a three-way array called the core, which relates the components for the different modes to each other.
To exploit rotational freedom in the model, one may rotate the core array (over all three modes) to an optimally simple form,
for instance by three-mode orthomax rotation. However, such a rotation of the core may inadvertently detract from the simplicity
of the component matrices. One remedy is to rotate the core only over those modes in which no simple solution for the component
matrices is desired or available, but this approach may in turn reduce the simplicity of the core to an unacceptable extent.
In the present paper, a general approach is developed, in which a criterion is optimized that not only takes into account
the simplicity of the core, but also, to any desired degree, the simplicity of the component matrices. This method (in contrast
to methods for either core or component matrix rotation) can be used to find solutions in which the core and the component
matrices are all reasonably simple. 相似文献
2.
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, our approach leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots, we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots. 相似文献
3.
We consider correspondence analysis (CA) and taxicab correspondence analysis (TCA) of relational datasets that can mathematically be described as weighted loopless graphs. Such data appear in particular in network analysis. We present CA and TCA as relaxation methods for the graph partitioning problem. Examples of real datasets are provided. 相似文献
4.
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function
over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values
is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple
correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative
PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized
version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are
promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework. 相似文献
5.
Carolyn J. Anderson 《Journal of Classification》2013,30(2):276-303
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS. 相似文献
6.
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional
equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional
coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices
to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform
principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence.
We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied
to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate
method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship
between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling.
The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data
in archeology.
The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish
Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional
relevant literature to our attention, significantly improved our article. 相似文献
7.
8.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings. 相似文献
9.
上证50ETF期权的推出丰富了市场交易的对冲机制,拓展了金融衍生品术语研究的领域。本文对期权交易术语的实值期权、平值期权、虚值期权和Fibonacci期权数列进行了基于交易数据的计算研究。通过对七年来期权交易数据的实时跟踪,我们建立了期权交易数据库,并采用计算术语学方法对期权交易进行了计算研究。我们发现:(1)Fibonacci期权数列是非常重要的期权交易术语,其黄金分割点适用于期权交易的波动测算;(2)上证50ETF期权合约涉及到期权交易术语的标准化问题,其交易并不是孤立的市场行为,同时受到新加坡A50指数等相关指数波动的影响;(3)上证50ETF期权交易体系是开放的:新加坡A50指数对我国上证50指数有交易的前瞻效应;上证50指数作为上证50ETF跟踪标的决定上证50ETF走向;上证50ETF作为期权标准合约跟踪标的影响到具体的合约交易;(4)上证50ETF期权交易的核心术语是杠杆波动率;Delta的计算赋值代表了高杠杆特性,其值域扩大将导致Fibonacci期权数列的黄金分割点出现偏差。最后得出结论:上证50ETF期权交易的研究是数据驱动的研究;基于统计的计算语言学方法可对期权术语进行赋值,并有效辅助期权交易;期权的计算术语学研究将促进数据驱动的术语学发展。 相似文献
10.
Wieslaw Szczesny 《Journal of Classification》1991,8(2):201-215
In two-class discriminant problems, objects are allocated to one of the two classes by means of threshold rules based on discriminant
functions. In this paper we propose to examine the quality of a discriminant functiong in terms of its performance curve. This curve is the plot of the two misclassification probabilities as the thresholdt assumes various real values. The role of such performance curves in evaluating and ordering discriminant functions and solving
discriminant problems is presented. In particular, it is shown that: (i) the convexity of such a curve is a sufficient condition
for optimal use of the information contained in the data reduced byg, and (ii)g with non-convex performance curve should be corrected by an explicitly obtained transformation. 相似文献
11.
Fionn Murtagh 《Journal of Classification》1998,15(2):161-183
We discuss the use of orthogonal wavelet transforms in preprocessing multivariate data for subsequent analysis, e.g., by
clustering the dimensionality reduction. Wavelet transforms allow us to introduce multiresolution approximation, and multiscale
nonparametric regression or smoothing, in a natural and integrated way into the data analysis. As will be explained in the
first part of the paper, this approach is of greatest interest for multivariate data analysis when we use (i) datasets with
ordered variables, e.g., time series, and (ii) object dimensionalities which are not too small, e.g., 16 and upwards. In
the second part of the paper, a different type of wavelet decomposition is used. Applications illustrate the powerfulness
of this new perspective on data analysis. 相似文献
12.
Functional Cluster Analysis via Orthonormalized Gaussian Basis Expansions and Its Application 总被引:1,自引:1,他引:0
We propose functional cluster analysis (FCA) for multidimensional functional data sets, utilizing orthonormalized Gaussian
basis functions. An essential point in FCA is the use of orthonormal bases that yield the identity matrix for the integral
of the product of any two bases. We construct orthonormalized Gaussian basis functions using Cholesky decomposition and derive
a property of Cholesky decomposition with respect to Gram-Schmidt orthonormalization. The advantages of the functional clustering
are that it can be applied to the data observed at different time points for each subject, and the functional structure behind
the data can be captured by removing the measurement errors. Numerical experiments are conducted to investigate the effectiveness
of the proposed method, as compared to conventional discrete cluster analysis. The proposed method is applied to three-dimensional
(3D) protein structural data that determine the 3D arrangement of amino acids in individual protein. 相似文献
13.
14.
Between-group analysis with heterogeneous covariance matrices: The common principal component model 总被引:1,自引:1,他引:0
W. J. Krzanowski 《Journal of Classification》1990,7(1):81-98
Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set. 相似文献
15.
Graphical representation of nonsymmetric relationships data has usually proceeded via separate displays for the symmetric and the skew-symmetric parts of a data matrix. DEDICOM avoids splitting the data into symmetric and skewsymmetric parts, but lacks a graphical representation of the results. Chino's GIPSCAL combines features of both models, but may have a poor goodness-of-fit compared to DEDICOM. We simplify and generalize Chino's method in such a way that it fits the data better. We develop an alternating least squares algorithm for the resulting method, called Generalized GIPSCAL, and adjust it to handle GIPSCAL as well. In addition, we show that Generalized GIPSCAL is a constrained variant of DEDICOM and derive necessary and sufficient conditions for equivalence of the two models. Because these conditions are rather mild, we expect that in many practical cases DEDICOM and Generalized GIPSCAL are (nearly) equivalent, and hence that the graphical representation from Generalized GIPSCAL can be used to display the DEDICOM results graphically. Such a representation is given for an illustration. Finally, we show Generalized GIPSCAL to be a generalization of another method for joint representation of the symmetric and skew-symmetric parts of a data matrix.This research has been made possible by a fellowship from the Royal Netherlands Academy of Arts and Sciences to the first author, and by research grant number A6394 to the second author, from the Natural Sciences and Engineering Research Council of Canada. The authors are obliged to Jos ten Berge and Naohito Chino for stimulating comments. 相似文献
16.
On Similarity Indices and Correction for Chance Agreement 总被引:1,自引:1,他引:0
Ahmed N. Albatineh Magdalena Niewiadomska-Bugaj Daniel Mihalko 《Journal of Classification》2006,23(2):301-313
Similarity indices can be used to compare partitions (clusterings) of a data set. Many such indices were introduced in the
literature over the years. We are showing that out of 28 indices we were able to track, there are 22 different ones. Even
though their values differ for the same clusterings compared, after correcting for agreement attributed to chance only, their
values become similar and some of them even become equivalent. Consequently, the problem of choice of the index to be used
for comparing different clusterings becomes less important. 相似文献
17.
In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered
a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules
prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test
as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not
perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a
one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets. 相似文献
18.
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches. 相似文献
19.
20.
L2
-norm: (1)
dynamic programming; (2) an iterative quadratic assignment improvement
heuristic; (3) the Guttman update strategy as modified by Pliner's technique
of smoothing; (4) a nonlinear programming reformulation by Lau, Leung, and
Tse. The methods are all implemented through (freely downloadable) MATLAB
m-files; their use is illustrated by a common data set carried throughout. For
the computationally intensive dynamic programming formulation that can a
globally optimal solution, several possible computational improvements are
discussed and evaluated using (a) a transformation of a given m-function with
the MATLAB Compiler into C code and compiling the latter; (b) rewriting an
m-function and a mandatory MATLAB gateway directly in Fortran and compiling
into a MATLAB callable file; (c) comparisons of the acceleration of raw
m-files implemented under the most recent release of MATLAB Version 6.5 (and compared to the absence of such
acceleration under the previous MATLAB Version 6.1). Finally, and in contrast
to the combinatorial optimization task of identifying a best unidimensional
scaling for a given proximity matrix, an approach is given for the
confirmatory fitting of a given unidimensional scaling based only on a fixed
object ordering, and to nonmetric unidensional scaling that incorporates an
additional optimal monotonic transformation of the proximities. 相似文献