首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multiple choice items on tests and Likert items on surveys are ubiquitous in educational, social and behavioral science research; however, methods for analyzing of such data can be problematic. Multidimensional item response theory models are proposed that yield structured Poisson regression models for the joint distribution of responses to items. The methodology presented here extends the approach described in Anderson, Verkuilen, and Peyton (2010) that used fully conditionally specified multinomial logistic regression models as item response functions. In this paper, covariates are added as predictors of the latent variables along with covariates as predictors of location parameters. Furthermore, the models presented here incorporate ordinal information of the response options thus allowing an empirical examination of assumptions regarding the ordering and the estimation of optimal scoring of the response options. To illustrate the methodology and flexibility of the models, data from a study on aggression in middle school (Espelage, Holt, and Henkel 2004) is analyzed. The models are fit to data using SAS.  相似文献   

2.
Classifiers serve as tools for classifying data into classes. They directly or indirectly take a distribution of data points around a given query point into account. To express the distribution of points from the viewpoint of distances from a given point, a probability distribution mapping function is introduced here. The approximation of this function in a form of a suitable power of the distance is presented. How to state this power—the distribution mapping exponent—is described. This exponent is used for probability density estimation in high-dimensional spaces and for classification. A close relation of the exponent to a singularity exponent is discussed. It is also shown that this classifier exhibits better behavior (classification accuracy) than other kinds of classifiers for some tasks.  相似文献   

3.
We discuss the use of orthogonal wavelet transforms in preprocessing multivariate data for subsequent analysis, e.g., by clustering the dimensionality reduction. Wavelet transforms allow us to introduce multiresolution approximation, and multiscale nonparametric regression or smoothing, in a natural and integrated way into the data analysis. As will be explained in the first part of the paper, this approach is of greatest interest for multivariate data analysis when we use (i) datasets with ordered variables, e.g., time series, and (ii) object dimensionalities which are not too small, e.g., 16 and upwards. In the second part of the paper, a different type of wavelet decomposition is used. Applications illustrate the powerfulness of this new perspective on data analysis.  相似文献   

4.
Rotation in Correspondence Analysis   总被引:1,自引:1,他引:0  
In correspondence analysis rows and columns of a nonnegative data matrix are depicted as points in a, usually, two-dimensional plot. Although such a two-dimensional plot often provides a reasonable approximation, the situation can occur that an approximation of higher dimensionality is required. This is especially the case when the data matrix is large. In such instances it may become difficult to interpret the solution. Similar to what is done in principal component analysis and factor analysis the correspondence analysis solution can be rotated to increase the interpretability. However, due to the various scaling options encountered in correspondence analysis, there are several alternative options for rotating the solutions. In this paper we consider two options for rotation in correspondence analysis. An example is provided so that the benefits of rotation become apparent.  相似文献   

5.
A mathematical programming approach to fitting general graphs   总被引:1,自引:1,他引:0  
We present an algorithm for fitting general graphs to proximity data. The algorithm utilizes a mathematical programming procedure based on a penalty function approach to impose additivity constraints upon parameters. For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation to the proximity data with the specified number of links, allowing for linear transformations of the data. The network distance is the minimum-path-length metric for connected graphs. As a limiting case, the algorithm provides a tree where each node corresponds to an object, if the number of links is set equal to the number of objects minus one. A Monte Carlo investigation indicates that the resulting networks tend to fall within one percentage point of the least-squares solution in terms of the variance accounted for, but do not always attain this global optimum. The network model is discussed in relation to ordinal network representations (Klauer 1989) and NETSCAL (Hutchinson 1989), and applied to several well-known data sets.  相似文献   

6.
Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.  相似文献   

7.
This paper primarily deals with theconceptual prospects for generalizing the aim ofabduction from the standard one of explainingsurprising or anomalous observations to that ofempirical progress or even truth approximation. Itturns out that the main abduction task then becomesthe instrumentalist task of theory revision aiming atan empirically more successful theory, relative to theavailable data, but not necessarily compatible withthem. The rest, that is, genuine empirical progress aswell as observational, referential and theoreticaltruth approximation, is a matter of evaluation andselection, and possibly new generation tasks forfurther improvement. The paper concludes with a surveyof possible points of departure, in AI and logic, forcomputational treatment of the instrumentalist taskguided by the `comparative evaluation matrix'.  相似文献   

8.
Two algorithms for fitting directed graphs to nonsymmetric proximity data are compared. The first approach, termed MAPNET, is a direct extension of a mathematical programming procedure for fitting undirected graphs to symmetric proximity data presented by Klauer and Carroll (1989). For a user-specified number of links, the algorithm seeks to provide the connected network that gives the least-squares approximation of the proximity data with the specified number of links, allowing for linear transformations of the data. The mathematical programming approach is compared to the NETSCAL method for fitting directed graphs (Hutchinson 1989), using the Monte Carlo methods and data sets employed by Hutchinson.  相似文献   

9.
The distribution of lengths of phylogenetic trees under the taxonomic principle of parsimony is compared with the distribution obtained by randomizing the characters of the sequence data. This comparison allows us to define a measure of the extent to which sequence data contain significant hierarchical information. We show how to calculate this measure exactly for up to 10 taxa, and provide a good approximation for larger sets of taxa. The measure is applied to test sequences on 10 and 15 taxa.  相似文献   

10.
The nearest neighbor interchange (nni) metric is a distance measure providing a quantitative measure of dissimilarity between two unrooted binary trees with labeled leaves. The metric has a transparent definition in terms of a simple transformation of binary trees, but its use in nontrivial problems is usually prevented by the absence of a computationally efficient algorithm. Since recent attempts to discover such an algorithm continue to be unsuccessful, we address the complementary problem of designing an approximation to the nni metric. Such an approximation should be well-defined, efficient to compute, comprehensible to users, relevant to applications, and a close fit to the nni metric; the challenge, of course, is to compromise these objectives in such a way that the final design is acceptable to users with practical and theoretical orientations. We describe an approximation algorithm that appears to satisfy adequately these objectives. The algorithm requires O(n) space to compute dissimilarity between binary trees withn labeled leaves; it requires O(n logn) time for rooted trees and O(n 2 logn) time for unrooted trees. To help the user interpret the dissimilarity measures based on this algorithm, we describe empirical distributions of dissimilarities between pairs of randomly selected trees for both rooted and unrooted cases.The Natural Sciences and Engineering Research Council of Canada partially supported this work with Grant A-4142.  相似文献   

11.
Metric and Euclidean properties of dissimilarity coefficients   总被引:8,自引:8,他引:0  
We assemble here properties of certain dissimilarity coefficients and are specially concerned with their metric and Euclidean status. No attempt is made to be exhaustive as far as coefficients are concerned, but certain mathematical results that we have found useful are presented and should help establish similar properties for other coefficients. The response to different types of data is investigated, leading to guidance on the choice of an appropriate coefficient.The authors wish to thank the referees, one of whom did a magnificent job in painstakingly checking the detailed algebra and detecting several slips.  相似文献   

12.
城市作为区域发展的中心,集聚各类要素资源与经济社会活动,其创新能力的提升关乎城市经济发展、区域协调发展及国家可持续发展。本文基于以往的研究成果,采集我国213个地级及以上城市2007-2016年间的面板数据,运用时间序列分析、空间数据分析及多元回归分析等研究方法描绘了城市创新能力的成长曲线,对城市创新能力的成长动力与驱动机制进行了探索,得出各个解释变量均对城市创新能力产生了显著的正向影响,其中,创新人才变量驱动能力最强的结论。在此基础上,本文还揭示不同区域不同城市的创新能力驱动机制,对成长动力各个维度在我国不同区域不同城市中的驱动强度进行了验证。  相似文献   

13.
When clustering asymmetric proximity data, only the average amounts are often considered by assuming that the asymmetry is due to noise. But when the asymmetry is structural, as typically may happen for exchange flows, migration data or confusion data, this may strongly affect the search for the groups because the directions of the exchanges are ignored and not integrated in the clustering process. The clustering model proposed here relies on the decomposition of the asymmetric dissimilarity matrix into symmetric and skew-symmetric effects both decomposed in within and between cluster effects. The classification structures used here are generally based on two different partitions of the objects fitted to the symmetric and the skew-symmetric part of the data, respectively; the restricted case is also presented where the partition fits jointly both of them allowing for clusters of objects similar with respect to the average amounts and directions of the data. Parsimonious models are presented which allow for effective and simple graphical representations of the results.  相似文献   

14.
Spectral analysis of phylogenetic data   总被引:12,自引:0,他引:12  
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences, the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum. A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa and perhaps for up to 30 taxa. The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted. The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data are new.  相似文献   

15.
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.  相似文献   

16.
The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2012. The Classification Society’s Service, as it was termed, was produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the Journal of Classification. Among our findings are the following: an enormous increase in scholarly production in this area post approximately 2000; and another big increase in quantity of publications from approximately 2004. The over 93,000 bibliographic records used is the basis for determining the research disciplines that we analyze. We make all this data available for download, formatted in text and in XML, with an accompanying Apache Lucene/Solr search interface.  相似文献   

17.
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology. The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional relevant literature to our attention, significantly improved our article.  相似文献   

18.
玻恩-奥本海默近似是量子化学以及化学动力学理论研究的重要基石之一,它对于化学反应过程中的动力学非常重要。但是在Cl+H2反应中该近似的适用性一直以来备受争议。本研究利用高灵敏度的交叉分子束实验,对Cl//Cl^*+H2反应进行了交叉分子束反应散射的实验研究。实验结果表明激发态Cl‘原子的反应性在低碰撞能下与基态的C1原子的反应性相当,说明玻恩-奥本海默近似在低能时的适用性存在问题。但是,随着碰撞能量的增加,相对于基态C1原子,Cl^*原子与H2的反应快速下降,这表明在Cl+H2这一反应中玻恩-奥本海默近似在高碰撞能时是适用的。这一结果得到精确动力学理论研究的证实,而且解决了一个长期有争议的重要科学问题。  相似文献   

19.
In educational measurement, cognitive diagnosis models have been developed to allow assessment of specific skills that are needed to perform tasks. Skill knowledge is characterized as present or absent and represented by a vector of binary indicators, or the skill set profile. After determining which skills are needed for each assessment item, a model is specified for the relationship between item responses and skill set profiles. Cognitive diagnosis models are often used for diagnosis, that is, for classifying students into the different skill set profiles. Generally, cognitive diagnosis models do not exploit student covariate information. However, investigating the effects of student covariates, such as gender, SES, or educational interventions, on skill knowledge mastery is important in education research, and covariate information may improve classification of students to skill set profiles. We extend a common cognitive diagnosis model, the DINA model, by modeling the relationship between the latent skill knowledge indicators and covariates. The probability of skill mastery is modeled as a logistic regression model, possibly with a student-level random intercept, giving a higher-order DINA model with a latent regression. Simulations show that parameter recovery is good for these models and that inclusion of covariates can improve skill diagnosis. When applying our methods to data from an online tutor, we obtain reasonable and interpretable parameter estimates that allow more detailed characterization of groups of students who differ in their predicted skill set profiles.  相似文献   

20.
Starting from the problem of missing data in surveys with Likert-type scales, the aim of this paper is to evaluate a possible improvement for the imputation procedure proposed by Lavori, Dawson, and Shera (1995) here called Approximate Bayesian bootstrap with Propensity score (ABP). We propose an imputation procedure named Approximate Bayesian bootstrap with Propensity score and Nearest neighbour (ABPN), which, after the ??propensity score step?? of ABP, randomly selects a donor in the nonrespondent??s neighbourhood, which includes cases with response patterns similar to the one of the nonrespondent to be imputed. A preliminary simulation study with single imputation on missing data in two Likerttype scales from a real data set shows that ABPN: (a) performed better than the ABP imputation, and (b) can be considered as a serious competitor of other procedures used in this context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号