首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 696 毫秒
1.
Traditionally latent class (LC) analysis is used by applied researchers as a tool for identifying substantively meaningful clusters. More recently, LC models have also been used as a density estimation tool for categorical variables. We introduce a divisive LC (DLC) model as a density estimation tool that may offer several advantages in comparison to a standard LC model. When using an LC model for density estimation, a considerable number of increasingly large LC models may have to be estimated before sufficient model-fit is achieved. A DLC model consists of a sequence of small LC models. Therefore, a DLC model can be estimated much faster and can easily utilize multiple processor cores, meaning that this model is more widely applicable and practical. In this study we describe the algorithm of fitting a DLC model, and discuss the various settings that indirectly influence the precision of a DLC model as a density estimation tool. These settings are illustrated using a synthetic data example, and the best performing algorithm is applied to a real-data example. The generated data example showed that, using specific decision rules, a DLC model is able to correctly model complex associations amongst categorical variables.  相似文献   

2.
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches.  相似文献   

3.
Non-symmetrical correspondence analysis (NSCA) is a very practical statistical technique for the identification of the structure of association between asymmetrically related categorical variables forming a contingency table. This paper considers some tools that can be used to numerically and graphically explore in detail the association between these variables and include the use of confidence regions, the establishment of the link between NSCA and the analysis of variance of categorical variables, and the effect of imposing linear constraints on a variable. The authors would like to thank the anonymous referees for their comments and suggestions during the preparation of this paper.  相似文献   

4.
Over the past decade, diagnostic classification models (DCMs) have become an active area of psychometric research. Despite their use, the reliability of examinee estimates in DCM applications has seldom been reported. In this paper, a reliability measure for the categorical latent variables of DCMs is defined. Using theory-and simulation-based results, we show how DCMs uniformly provide greater examinee estimate reliability than IRT models for tests of the same length, a result that is a consequence of the smaller range of latent variable values examinee estimates can take in DCMs. We demonstrate this result by comparing DCM and IRT reliability for a series of models estimated with data from an end-of-grade test, culminating with a discussion of how DCMs can be used to change the character of large scale testing, either by shortening tests that measure examinees unidimensionally or by providing more reliable multidimensional measurement for tests of the same length.  相似文献   

5.
In many statistical applications data are curves measured as functions of a continuous parameter as time. Despite of their functional nature and due to discrete-time observation, these type of data are usually analyzed with multivariate statistical methods that do not take into account the high correlation between observations of a single curve at nearby time points. Functional data analysis methodologies have been developed to solve these type of problems. In order to predict the class membership (multi-category response variable) associated to an observed curve (functional data), a functional generalized logit model is proposed. Base-line category logit formulations will be considered and their estimation based on basis expansions of the sample curves of the functional predictor and parameters. Functional principal component analysis will be used to get an accurate estimation of the functional parameters and to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental study with simulated and real data.  相似文献   

6.
We introduce new similarity measures between two subjects, with reference to variables with multiple categories. In contrast to traditionally used similarity indices, they also take into account the frequency of the categories of each attribute in the sample. This feature is useful when dealing with rare categories, since it makes sense to differently evaluate the pairwise presence of a rare category from the pairwise presence of a widespread one. A weighting criterion for each category derived from Shannon??s information theory is suggested. There are two versions of the weighted index: one for independent categorical variables and one for dependent variables. The suitability of the proposed indices is shown in this paper using both simulated and real world data sets.  相似文献   

7.
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.  相似文献   

8.
Recent research into graphical association models has focussed interest on the conditional Gaussian distribution for analyzing mixtures of categorical and continuous variables. A special case of such models, utilizing the homogeneous conditional Gaussian distribution, has in fact been known since 1961 as the location model, and for the past 30 years has provided a basis for the multivariate analysis of mixed categorical and continuous variables. Extensive development of this model took place throughout the 1970’s and 1980’s in the context of discrimination and classification, and comprehensive methodology is now available for such analysis of mixed variables. This paper surveys these developments and summarizes current capabilities in the area. Topics include distances between groups, discriminant analysis, error rates and their estimation, model and feature selection, and the handling of missing data.  相似文献   

9.
The location model is a useful tool in parametric analysis of mixed continuous and categorical variables. In this model, the continuous variables are assumed to follow different multivariate normal distributions for each possible combination of categorical variable values. Using this model, a distance between two populations involving mixed variables can be defined. To date, however, no distributional results have been available, against which to assess the outcomes of practical applications of this distance. The null distribution of estimated distance is therefore considered in this paper, for a range of possible situations. No explicit analytical expressions are derived for this distribution, but easily implementable Monte Carlo schemes are described. These are then applied to previously cited examples.  相似文献   

10.
An approach is presented for analyzing a heterogeneous set of categorical variables assumed to form a limited number of homogeneous subsets. The variables generate a particular set of proximities between the objects in the data matrix, and the objective of the analysis is to represent the objects in lowdimensional Euclidean spaces, where the distances approximate these proximities. A least squares loss function is minimized that involves three major components: a) the partitioning of the heterogeneous variables into homogeneous subsets; b) the optimal quantification of the categories of the variables, and c) the representation of the objects through multiple multidimensional scaling tasks performed simultaneously. An important aspect from an algorithmic point of view is in the use of majorization. The use of the procedure is demonstrated by a typical example of possible application, i.e., the analysis of categorical data obtained in a free-sort task. The results of points of view analysis are contrasted with a standard homogeneity analysis, and the stability is studied through a Jackknife analysis.  相似文献   

11.
K-modes Clustering   总被引:2,自引:0,他引:2  
0 norm (defined as the limit of an Lp norm as p approaches zero). In Monte Carlo simulations, both K-modes and the latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underlying cluster structure. However, K-modes is an order of magnitude faster than the latent class procedure in speed and suffers from fewer problems of local optima than do the latent class procedures. For data sets involving a large number of categorical variables, latent class procedures become computationally extremly slow and hence infeasible. We conjecture that, although in some cases latent class procedures might perform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "complementary" procedures in performing cluster analysis. We also present an empirical comparison of K-modes and latent class, where the former method prevails.  相似文献   

12.
A permutation-based algorithm for block clustering   总被引:2,自引:1,他引:1  
Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.  相似文献   

13.
Dimensionally reduced model-based clustering methods are recently receiving a wide interest in statistics as a tool for performing simultaneously clustering and dimension reduction through one or more latent variables. Among these, Mixtures of Factor Analyzers assume that, within each component, the data are generated according to a factor model, thus reducing the number of parameters on which the covariance matrices depend. In Factor Mixture Analysis clustering is performed through the factors of an ordinary factor analysis which are jointly modelled by a Gaussian mixture. The two approaches differ in genesis, parameterization and consequently clustering performance. In this work we propose a model which extends and combines them. The proposed Mixtures of Factor Mixture Analyzers provide a unified class of dimensionally reduced mixture models which includes the previous ones as special cases and could offer a powerful tool for modelling non-Gaussian latent variables.  相似文献   

14.
This paper sketches a dispositionalist conception of laws and shows how the dispositionalist should respond to certain objections. The view that properties are essentially dispositional is able to provide an account of laws that avoids the problems that face the two views of laws (the regularity and the contingent nomic necessitation views) that regard properties as categorical and laws as contingent. I discuss and reject the objections that (i) this view makes laws necessary whereas they are contingent; (ii) this view cannot account for certain kinds of laws of nature and their properties.  相似文献   

15.
本研究采用三点弯曲实验方法系统地研究了具有不同尺度组元层和界面结构搭配的Cu/Au和Cu/Cr层状材料的断裂行为及其尺度与界面/晶界效应。研究发现,组元性质、尺度和界面/晶界结构是影响金属多层材料变形和断裂行为的主要因素。对于具有"透明"界面的Cu/Au多层材料,其变形和断裂行为表现出显著的尺度效应,而具有"模糊"界面的Cu/Cr多层材料的塑性相对较差,其变形和断裂行为没有明显的尺度效应。基于理论分析,提出了高强高韧层状金属材料的多尺度层状结构设计思路。  相似文献   

16.
We investigate the effects of a complex sampling design on the estimation of mixture models. An approximate or pseudo likelihood approach is proposed to obtain consistent estimates of class-specific parameters when the sample arises from such a complex design. The effects of ignoring the sample design are demonstrated empirically in the context of an international value segmentation study in which a multinomial mixture model is applied to identify segment-level value rankings. The analysis reveals that ignoring the sample design results in both an incorrect number of segments as identified by information criteria and biased estimates of segment-level parameters.  相似文献   

17.
In this paper we consider the major development of mathematical analysis during the mid-nineteenth century. On the basis of Jahnke’s (Hist Math 20(3):265–284, 1993) distinction between considering mathematics as an empirical science based on time and space and considering mathematics as a purely conceptual science we discuss the Swedish nineteenth century mathematician E.G. Bj?rling’s general view of real- and complexvalued functions. We argue that Bj?rling had a tendency to sometimes consider mathematical objects in a naturalistic way. One example is how Bj?rling interprets Cauchy’s definition of the logarithm function with respect to complex variables, which is investigated in the paper. Furthermore, in view of an article written by Bj?rling (Kongl Vetens Akad F?rh Stockholm 166–228, 1852) we consider Cauchy’s theorem on power series expansions of complex valued functions. We investigate Bj?rling’s, Cauchy’s and the Belgian mathematician Lamarle’s different conditions for expanding a complex function of a complex variable in a power series. We argue that one reason why Cauchy’s theorem was controversial could be the ambiguities of fundamental concepts in analysis that existed during the mid-nineteenth century. This problem is demonstrated with examples from Bj?rling, Cauchy and Lamarle.  相似文献   

18.
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS.  相似文献   

19.
从关孝和的累裁招差法看《授时历》平立定三差法之原   总被引:2,自引:0,他引:2  
本章介绍了和算累裁招差法的基本方法,考察了和算招差法与中算招差法之间的联系,说明了和算招差法的概念和思想方法的中算来源。结合对关孝和累裁招差法造术原理的分析,讨论了《授时历》三次插值法的构建原理问题,提出了新的复原方案。  相似文献   

20.
The primary method for validating cluster analysis techniques is throughMonte Carlo simulations that rely on generating data with known cluster structure (e.g., Milligan 1996). This paper defines two kinds of data generation mechanisms with cluster overlap, marginal and joint; current cluster generation methods are framed within these definitions. An algorithm generating overlapping clusters based on shared densities from several different multivariate distributions is proposed and shown to lead to an easily understandable notion of cluster overlap. Besides outlining the advantages of generating clusters within this framework, a discussion is given of how the proposed data generation technique can be used to augment research into current classification techniques such as finite mixture modeling, classification algorithm robustness, and latent profile analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号