首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.
In educational measurement, cognitive diagnosis models have been developed to allow assessment of specific skills that are needed to perform tasks. Skill knowledge is characterized as present or absent and represented by a vector of binary indicators, or the skill set profile. After determining which skills are needed for each assessment item, a model is specified for the relationship between item responses and skill set profiles. Cognitive diagnosis models are often used for diagnosis, that is, for classifying students into the different skill set profiles. Generally, cognitive diagnosis models do not exploit student covariate information. However, investigating the effects of student covariates, such as gender, SES, or educational interventions, on skill knowledge mastery is important in education research, and covariate information may improve classification of students to skill set profiles. We extend a common cognitive diagnosis model, the DINA model, by modeling the relationship between the latent skill knowledge indicators and covariates. The probability of skill mastery is modeled as a logistic regression model, possibly with a student-level random intercept, giving a higher-order DINA model with a latent regression. Simulations show that parameter recovery is good for these models and that inclusion of covariates can improve skill diagnosis. When applying our methods to data from an online tutor, we obtain reasonable and interpretable parameter estimates that allow more detailed characterization of groups of students who differ in their predicted skill set profiles.  相似文献   

2.
In this paper we provide an explicit probability distribution for classification purposes when observations are viewed on the real line and classifications are to be based on numerical orderings. The classification model is derived from a Bayesian nonparametric mixture of Dirichlet process model; with some modifications. The resulting approach then more closely resembles a classical hierarchical grouping rule in that it depends on sums of squares of neighboring values. The proposed probability model for classification relies on a numerical procedure based on a reversible Markov chain Monte Carlo (MCMC) algorithm for determining the probabilities. Some numerical illustrations comparing with alternative ideas for classification are provided.  相似文献   

3.
Probabilistic feature models (PFMs) can be used to explain binary rater judgements about the associations between two types of elements (e.g., objects and attributes) on the basis of binary latent features. In particular, to explain observed object-attribute associations PFMs assume that respondents classify both objects and attributes with respect to a, usually small, number of binary latent features, and that the observed object-attribute association is derived as a specific mapping of these classifications. Standard PFMs assume that the object-attribute association probability is the same according to all respondents, and that all observations are statistically independent. As both assumptions may be unrealistic, a multilevel latent class extension of PFMs is proposed which allows objects and/or attribute parameters to be different across latent rater classes, and which allows to model dependencies between associations with a common object (attribute) by assuming that the link between features and objects (attributes) is fixed across judgements. Formal relationships with existing multilevel latent class models for binary three-way data are described. As an illustration, the models are used to study rater differences in product perception and to investigate individual differences in the situational determinants of anger-related behavior.  相似文献   

4.
Recent research into graphical association models has focussed interest on the conditional Gaussian distribution for analyzing mixtures of categorical and continuous variables. A special case of such models, utilizing the homogeneous conditional Gaussian distribution, has in fact been known since 1961 as the location model, and for the past 30 years has provided a basis for the multivariate analysis of mixed categorical and continuous variables. Extensive development of this model took place throughout the 1970’s and 1980’s in the context of discrimination and classification, and comprehensive methodology is now available for such analysis of mixed variables. This paper surveys these developments and summarizes current capabilities in the area. Topics include distances between groups, discriminant analysis, error rates and their estimation, model and feature selection, and the handling of missing data.  相似文献   

5.
Power and Sample Size Computation for Wald Tests in Latent Class Models   总被引:1,自引:0,他引:1  
Latent class (LC) analysis is used by social, behavioral, and medical science researchers among others as a tool for clustering (or unsupervised classification) with categorical response variables, for analyzing the agreement between multiple raters, for evaluating the sensitivity and specificity of diagnostic tests in the absence of a gold standard, and for modeling heterogeneity in developmental trajectories. Despite the increased popularity of LC analysis, little is known about statistical power and required sample size in LC modeling. This paper shows how to perform power and sample size computations in LC models using Wald tests for the parameters describing association between the categorical latent variable and the response variables. Moreover, the design factors affecting the statistical power of these Wald tests are studied. More specifically, we show how design factors which are specific for LC analysis, such as the number of classes, the class proportions, and the number of response variables, affect the information matrix. The proposed power computation approach is illustrated using realistic scenarios for the design factors. A simulation study conducted to assess the performance of the proposed power analysis procedure shows that it performs well in all situations one may encounter in practice.  相似文献   

6.
A Thurstonian model for ranks is introduced in which rank-induced dependencies are specified through correlation coefficients among ranked objects that are determined by a vector of rank-induced parameters. The ranking model can be expressed in terms of univariate normal distribution functions, thus simplifying a previously computationally intensive problem. A theorem is proven that shows that the specification given in the paper for the dependencies is the only way that this simplification can be achieved under the process assumptions of the model. The model depends on certain conditional probabilities that arise from item orders considered by subjects as they make ranking decisions. Examples involving a complete set of ranks and a set with missing values are used to illustrate recovery of the objects’ scale values and the rank dependency parameters. Application of the model to ranks for gift items presented singly or as composite items is also discussed.  相似文献   

7.
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.  相似文献   

8.
We propose an analysis of the notion of model as crucially related to the notion of point of view. A model in this sense would always suggest a certain way of looking at a real system, a certain way of thinking about it and a certain way of acting upon it. We focus on System Dynamics as a paradigmatic case with respect to many of the features and problems we can find in the field of modelling and simulation. We analyse in detail some of those features. All of them would be present in many other cases of construction and use of models. Furthermore, they would support the thesis that a model can be fruitfully understood as offering a point of view capable of improving our own points of view over a certain system. The point of view offered by the model could include both non-conceptual and conceptual contents, it would have a complex structure and behaviour, and it would have direct consequences on the decisions made by the subjects adopting that point of view.  相似文献   

9.
Many problems entail the analysis of data that are independent and identically distributed random graphs. Useful inference requires flexible probability models for such random graphs; these models should have interpretable location and scale parameters, and support the establishment of confidence regions, maximum likelihood estimates, goodness-of-fit tests, Bayesian inference, and an appropriate analogue of linear model theory. Banks and Carley (1994) develop a simple probability model and sketch some analyses; this paper extends that work so that analysts are able to choose models that reflect application-specific metrics on the set of graphs. The strategy applies to graphs, directed graphs, hypergraphs, and trees, and often extends to objects in countable metric spaces.  相似文献   

10.
In the framework of incomplete data analysis, this paper provides a nonparametric approach to missing data imputation based on Information Retrieval. In particular, an incremental procedure based on the iterative use of tree-based method is proposed and a suitable Incremental Imputation Algorithm is introduced. The key idea is to define a lexicographic ordering of cases and variables so that conditional mean imputation via binary trees can be performed incrementally. A simulation study and real data applications are carried out to describe the advantages and the performance with respect to standard approaches.  相似文献   

11.
In this paper, we argue for the centrality of prediction in the use of computational models in science. We focus on the consequences of the irreversibility of computational models and on the conditional or ceteris paribus, nature of the kinds of their predictions. By irreversibility, we mean the fact that computational models can generally arrive at the same state via many possible sequences of previous states. Thus, while in the natural world, it is generally assumed that physical states have a unique history, representations of those states in a computational model will usually be compatible with more than one possible history in the model. We describe some of the challenges involved in prediction and retrodiction in computational models while arguing that prediction is an essential feature of non-arbitrary decision making. Furthermore, we contend that the non-predictive virtues of computational models are dependent to a significant degree on the predictive success of the models in question.  相似文献   

12.
Trees, and particularly binary trees, appear frequently in the classification literature. When studying the properties of the procedures that fit trees to sets of data, direct analysis can be too difficult, and Monte Carlo simulations may be necessary, requiring the implementation of algorithms for the generation of certain families of trees at random. In the present paper we use the properties of Prufer's enumeration of the set of completely labeled trees to obtain algorithms for the generation of completely labeled, as well as terminally labeled t-ary (and in particular binary) trees at random, i.e., with uniform distribution. Actually, these algorithms are general in that they can be used to generate random trees from any family that can be characterized in terms of the node degrees. The algorithms presented here are as fast as (in the case of terminally labeled trees) or faster than (in the case of completely labeled trees) any other existing procedure, and the memory requirements are minimal. Another advantage over existing algorithms is that there is no need to store pre-calculated tables.  相似文献   

13.
We argue from the Church-Turing thesis (Kleene Mathematical logic. New York: Wiley 1967) that a program can be considered as equivalent to a formal language similar to predicate calculus where predicates can be taken as functions. We can relate such a calculus to Wittgenstein’s first major work, the Tractatus, and use the Tractatus and its theses as a model of the formal classical definition of a computer program. However, Wittgenstein found flaws in his initial great work and he explored these flaws in a new thesis described in his second great work; the Philosophical Investigations. The question we address is “can computer science make the same leap?” We are proposing, because of the flaws identified by Wittgenstein, that computers will never have the possibility of natural communication with people unless they become active participants of human society. The essential difference between formal models used in computing and human communication is that formal models are based upon rational sets whereas people are not so restricted. We introduce irrational sets as a concept that requires the use of an abductive inference system. However, formal models are still considered central to our means of using hypotheses through deduction to make predictions about the world. These formal models are required to continually be updated in response to peoples’ changes in their way of seeing the world. We propose that one mechanism used to keep track of these changes is the Peircian abductive loop.  相似文献   

14.
Traditionally latent class (LC) analysis is used by applied researchers as a tool for identifying substantively meaningful clusters. More recently, LC models have also been used as a density estimation tool for categorical variables. We introduce a divisive LC (DLC) model as a density estimation tool that may offer several advantages in comparison to a standard LC model. When using an LC model for density estimation, a considerable number of increasingly large LC models may have to be estimated before sufficient model-fit is achieved. A DLC model consists of a sequence of small LC models. Therefore, a DLC model can be estimated much faster and can easily utilize multiple processor cores, meaning that this model is more widely applicable and practical. In this study we describe the algorithm of fitting a DLC model, and discuss the various settings that indirectly influence the precision of a DLC model as a density estimation tool. These settings are illustrated using a synthetic data example, and the best performing algorithm is applied to a real-data example. The generated data example showed that, using specific decision rules, a DLC model is able to correctly model complex associations amongst categorical variables.  相似文献   

15.
Unique parametrizations of models are very important for parameter interpretation and consistency of estimators. In this paper we analyze the identifiability of a general class of finite mixtures of multinomial logits with varying and fixed effects, which includes the popular multinomial logit and conditional logit models. The application of the general identifiability conditions is demonstrated on several important special cases and relations to previously established results are discussed. The main results are illustrated with a simulation study using artificial data and a marketing dataset of brand choices.  相似文献   

16.
17.
According to Vázquez and Liz (Found Sci 16(4): 383–391, 2011), Points of View (PoV) can be considered in two different ways. On the one hand, they can be explained following the model of propositional attitudes. This model assumes that the internal structure of a PoV is constituted by a subject, a set of contents, and a set of relations between the subject and those contents. On the other hand, we can analyze points of view taking as a model the notions of location and access. If we choose to follow the second approach, instead of the first one, the internal structure of a PoV is not directly addressed, and the emphasized features of PoV are related to the function that PoV are intended to have. That is, PoV are directly identified by their role and they can solely be understood as ways of accessing the world that bring some kind of perspective about it. Having this in mind, we would like to propose a notation that explains how to understand such access as a sort of models (that can allow the creation of concepts), independently of whether the precise PoV under consideration is impersonal or non-impersonal, its kind of content, and its subjective or objective character. First, we will present an account of some previous approaches to the study of points of view. Then, we will analyze what kind of structure the world is assumed to posses and how the access to it is possible. Third, we will develop a notation that explains PoV as qualitative dimensions by means of which it is possible to valuate objects and states of the world.  相似文献   

18.
A general set of multidimensional unfolding models and algorithms is presented to analyze preference or dominance data. This class of models termed GENFOLD2 (GENeral UnFOLDing Analysis-Version 2) allows one to perform internal or external analysis, constrained or unconstrained analysis, conditional or unconditional analysis, metric or nonmetric analysis, while providing the flexibility of specifying and/or testing a variety of different types of unfolding-type preference models mentioned in the literature including Caroll's (1972, 1980) simple, weighted, and general unfolding analysis. An alternating weighted least-squares algorithm is utilized and discussed in terms of preventing degenerate solutions in the estimation of the specified parameters. Finally, two applications of this new method are discussed concerning preference data for ten brands of pain relievers and twelve models of residential communication devices.  相似文献   

19.
On Similarity Indices and Correction for Chance Agreement   总被引:1,自引:1,他引:0  
Similarity indices can be used to compare partitions (clusterings) of a data set. Many such indices were introduced in the literature over the years. We are showing that out of 28 indices we were able to track, there are 22 different ones. Even though their values differ for the same clusterings compared, after correcting for agreement attributed to chance only, their values become similar and some of them even become equivalent. Consequently, the problem of choice of the index to be used for comparing different clusterings becomes less important.  相似文献   

20.
Dimensionally reduced model-based clustering methods are recently receiving a wide interest in statistics as a tool for performing simultaneously clustering and dimension reduction through one or more latent variables. Among these, Mixtures of Factor Analyzers assume that, within each component, the data are generated according to a factor model, thus reducing the number of parameters on which the covariance matrices depend. In Factor Mixture Analysis clustering is performed through the factors of an ordinary factor analysis which are jointly modelled by a Gaussian mixture. The two approaches differ in genesis, parameterization and consequently clustering performance. In this work we propose a model which extends and combines them. The proposed Mixtures of Factor Mixture Analyzers provide a unified class of dimensionally reduced mixture models which includes the previous ones as special cases and could offer a powerful tool for modelling non-Gaussian latent variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号