首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting proportions of viewing time across television networks for an area sample of households.  相似文献   

2.
A maximum likelihood methodology for clusterwise linear regression   总被引:9,自引:0,他引:9  
This paper presents a conditional mixture, maximum likelihood methodology for performing clusterwise linear regression. This new methodology simultaneously estimates separate regression functions and membership inK clusters or groups. A review of related procedures is discussed with an associated critique. The conditional mixture, maximum likelihood methodology is introduced together with the E-M algorithm utilized for parameter estimation. A Monte Carlo analysis is performed via a fractional factorial design to examine the performance of the procedure. Next, a marketing application is presented concerning the evaluations of trade show performance by senior marketing executives. Finally, other potential applications and directions for future research are identified.  相似文献   

3.
Mokken scale analysis uses an automated bottom-up stepwise item selection procedure that suffers from two problems. First, when selected during the procedure items satisfy the scaling conditions but they may fail to do so after the scale has been completed. Second, the procedure is approximate and thus may not produce the optimal item partitioning. This study investigates a variation on Mokken’s item selection procedure, which alleviates the first problem, and proposes a genetic algorithm, which alleviates both problems. The genetic algorithm is an approximation to checking all possible partitionings. A simulation study shows that the genetic algorithm leads to better scaling results than the other two procedures.  相似文献   

4.
A latent class vector model for preference ratings   总被引:1,自引:1,他引:1  
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and some possible extensions are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.”  相似文献   

5.
6.
文化资源是体现一个国家文化实力的核心要素,也是国家文化及文化产业发展的基础和源头。我国对各类物质和非物质文化资源数字化工作的开展,为我们利用大数据分析等先进技术,加强对中华文化的充分认知和深入挖掘利用提供了前所未有的契机和条件。本文利用大数据分析等技术手段,对我国如何加强文化资源管理的总体思路、技术框架和有关对策措施提出了建议。  相似文献   

7.
An approach is presented for analyzing a heterogeneous set of categorical variables assumed to form a limited number of homogeneous subsets. The variables generate a particular set of proximities between the objects in the data matrix, and the objective of the analysis is to represent the objects in lowdimensional Euclidean spaces, where the distances approximate these proximities. A least squares loss function is minimized that involves three major components: a) the partitioning of the heterogeneous variables into homogeneous subsets; b) the optimal quantification of the categories of the variables, and c) the representation of the objects through multiple multidimensional scaling tasks performed simultaneously. An important aspect from an algorithmic point of view is in the use of majorization. The use of the procedure is demonstrated by a typical example of possible application, i.e., the analysis of categorical data obtained in a free-sort task. The results of points of view analysis are contrasted with a standard homogeneity analysis, and the stability is studied through a Jackknife analysis.  相似文献   

8.
本文考察了波普尔和库恩科学哲学在科学哲学界、科学界及人文社科界的影响。本文的情境分析根据波普尔和库恩作品的特点、社会联系、社会情境、个性因素来解释他们的影响。文章的考察实际上触及了思想史上的社会建构因素,而此因素应该广泛作用于各学科的历史。  相似文献   

9.
A sequential fitting procedure for linear data analysis models   总被引:1,自引:1,他引:0  
A particular factor analysis model with parameter constraints is generalized to include classification problems definable within a framework of fitting linear models. The sequential fitting (SEFIT) approach of principal component analysis is extended to include several nonstandard data analysis and classification tasks. SEFIT methods attempt to explain the variability in the initial data (commonly defined by a sum of squares) through an additive decomposition attributable to the various terms in the model. New methods are developed for both traditional and fuzzy clustering that have useful theoretic and computational properties (principal cluster analysis, additive clustering, and so on). Connections to several known classification strategies are also stated.The author is grateful to P. Arabie and L. J. Hubert for editorial assistance and reviewing going well beyond traditional levels.  相似文献   

10.
Numerical classification of proximity data with assignment measures   总被引:1,自引:1,他引:0  
An approach to numerical classification is described, which treats the assignment of objects to types as a continuous variable, called an assignment measure. Describing a classification by an assignment measure allows one not only to determine the types of objects, but also to see relationships among the objects of the same type and among the types themselves.A classification procedure, the Assignment-Prototype algorithm, is described and evaluated. It is a numerical technique for obtaining assignment measures directly from one-mode, two-way proximity matrices.  相似文献   

11.
Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model.  相似文献   

12.
A validation study of a variable weighting algorithm for cluster analysis   总被引:1,自引:0,他引:1  
De Soete (1986, 1988) proposed a variable weighting procedure when Euclidean distance is used as the dissimilarity measure with an ultrametric hierarchical clustering method. The algorithm produces weighted distances which approximate ultrametric distances as closely as possible in a least squares sense. The present simulation study examined the effectiveness of the De Soete procedure for an applications problem for which it was not originally intended. That is, to determine whether or not the algorithm can be used to reduce the influence of variables which are irrelevant to the clustering present in the data. The simulation study examined the ability of the procedure to recover a variety of known underlying cluster structures. The results indicate that the algorithm is effective in identifying extraneous variables which do not contribute information about the true cluster structure. Weights near 0.0 were typically assigned to such extraneous variables. Furthermore, the variable weighting procedure was not adversely effected by the presence of other forms of error in the data. In general, it is recommended that the variable weighting procedure be used for applied analyses when Euclidean distance is employed with ultrametric hierarchical clustering methods.  相似文献   

13.
A cluster diagram is a rooted planar tree that depicts the hierarchical agglomeration of objects into groups of increasing size. On the null hypothesis that at each stage of the clustering procedure all possible joins are equally probable, we derive the probability distributions for two properties of these diagrams: (1)S, the number of single objects previously ungrouped that are joined in the final stages of clustering, and (2)m k, the number of groups ofk+1 objects that are formed during the process. Ecological applications of statistical tests for these properties are described and illustrated with data from weed communities of Saskatchewan fields.This work was supported by the Natural Sciences and Engineering Research Council of Canada.  相似文献   

14.
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology. The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional relevant literature to our attention, significantly improved our article.  相似文献   

15.
Dimensionally reduced model-based clustering methods are recently receiving a wide interest in statistics as a tool for performing simultaneously clustering and dimension reduction through one or more latent variables. Among these, Mixtures of Factor Analyzers assume that, within each component, the data are generated according to a factor model, thus reducing the number of parameters on which the covariance matrices depend. In Factor Mixture Analysis clustering is performed through the factors of an ordinary factor analysis which are jointly modelled by a Gaussian mixture. The two approaches differ in genesis, parameterization and consequently clustering performance. In this work we propose a model which extends and combines them. The proposed Mixtures of Factor Mixture Analyzers provide a unified class of dimensionally reduced mixture models which includes the previous ones as special cases and could offer a powerful tool for modelling non-Gaussian latent variables.  相似文献   

16.
Clustering the rows and columns of a contingency table   总被引:3,自引:2,他引:1  
A number of ways of investigating heterogeneity in a two-way contingency table are reviewed. In particular, we consider chi-square decompositions of the Pearson chi-square statistic with respect to the nodes of a hierarchical clustering of the rows and/or the columns of the table. A cut-off point which indicates significant clustering may be defined on the binary trees associated with the respective row and column cluster analyses. This approach provides a simple graphical procedure which is useful in interpreting a significant chi-square statistic of a contingency table.The author gratefully acknowledges the constructive comments of the referees and the editor.  相似文献   

17.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

18.
A low-dimensional representation of multivariate data is often sought when the individuals belong to a set ofa-priori groups and the objective is to highlight between-group variation relative to that within groups. If all the data are continuous then this objective can be achieved by means of canonical variate analysis, but no corresponding technique exists when the data are categorical or mixed continuous and categorical. On the other hand, if there is noa-priori grouping of the individuals, then ordination of any form of data can be achieved by use of metric scaling (principal coordinate analysis). In this paper we consider a simple extension of the latter approach to incorporate grouped data, and discuss to what extent this method can be viewed as a generalization of canonical variate analysis. Some illustrative examples are also provided.  相似文献   

19.
k  . In this procedure, a least-squares loss function in terms of discrepancies between D and M is minimized. The present paper describes the original hierarchical classes algorithm proposed by De Boeck and Rosenberg (1988), which is based on an alternating greedy heuristic, and proposes a new algorithm, based on an alternating branch-and-bound procedure. An extensive simulation study is reported in which both algorithms are evaluated and compared according to goodness-of-fit to the data and goodness-of-recovery of the underlying true structure. Furthermore, three heuristics for selecting models of different ranks for a given D are presented and compared. The simulation results show that the new algorithm yields models with slightly higher goodness-of-fit and goodness-of-recovery values.  相似文献   

20.
Variable selection in clustering   总被引:2,自引:1,他引:1  
Standard clustering algorithms can completely fail to identify clear cluster structure if that structure is confined to a subset of the variables. A forward selection procedure for identifying the subset is proposed and studied in the context of complete linkage hierarchical clustering. The basic approach can be applied to other clustering methods, too.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号