首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 468 毫秒
1.
The rapid increase in the size of data sets makes clustering all the more important to capture and summarize the information, at the same time making clustering more difficult to accomplish. If model-based clustering is applied directly to a large data set, it can be too slow for practical application. A simple and common approach is to first cluster a random sample of moderate size, and then use the clustering model found in this way to classify the remainder of the objects. We show that, in its simplest form, this method may lead to unstable results. Our experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling method: several tentative models are identified from the sample instead of just one, and several EM steps are used rather than just one E step to classify the full data set. We find that there are significant gains from increasing the size of the sample up to about 2,000, but not from further increases. These conclusions are based on the application of several alternative strategies to the segmentation of three different multispectral images, and to several simulated data sets.  相似文献   

2.
This study evaluates performance of information criteria used to separate latent classes. In the evaluations, various numbers of latent classes, sample sizes, parameter structures and latent-class complexities were designed to simulate datasets. The average accuracy rates of information criteria in selecting the designed numbers of latent classes were the core results in this experiment. The study revealed that widely used information criteria, e.g., AIC, BIC, CAIC, could perform poorly under some circumstances. By including a sample size adjustment (Rissanen, 1978), the unsatis-factory performances could be improved considerably. The sample size adjustment provides a plausible solution for separating latent classes. Guidelines are provided to help achieve optimum use of the model fit indices.  相似文献   

3.
孙军 《中国基础科学》2010,12(2):18-19,11
主导许多材料力学行为的孪晶变形是一种局部晶体高度协调一致的非弹性剪切变形过程,其发生原因与时空特性仍然保持着某种神秘色彩。本项研究利用纳米压入仪下微柱体压缩与相应的透射电镜原位定量变形表征技术,发现随所用钛铝合金单晶外观尺度逐步减小到1个μm时,孪晶切变所需应力随之显著提高,表现出很强的尺度依赖性。当晶体的外部几何尺度进一步减小到亚μm量级时,材料的塑性变形方式发生了根本性的转变,孪晶变形完全由通常的位错滑移变形取代,而材料所能承受的最大流变应力亦呈现出一种接近于所用材料理想强度水平的"应力饱和"平台现象。本研究提出了以螺位错为媒介的孪晶变形"受激滑移"模型,完美地解释了孪晶变形具有强烈晶体尺寸效应的内在原因。  相似文献   

4.
The DINA model is a commonly used model for obtaining diagnostic information. Like many other Diagnostic Classification Models (DCMs), it can require a large sample size to obtain reliable item and examinee parameter estimation. Neural Network (NN) analysis is a classification method that uses a training dataset for calibration. As a result, if this training dataset is determined theoretically, as was the case in Gierl’s attribute hierarchical method (AHM), the NN analysis does not have any sample size requirements. However, a NN approach does not provide traditional item parameters of a DCM or allow for item responses to influence test calibration. In this paper, the NN approach will be implemented for the DINA model estimation to explore its effectiveness as a classification method beyond its use in AHM. The accuracy of the NN approach across different sample sizes, item quality and Q-matrix complexity is described in the DINA model context. Then, a Markov Chain Monte Carlo (MCMC) estimation algorithm and Joint Maximum Likelihood Estimation is used to extend the NN approach so that item parameters associated with the DINA model are obtained while allowing examinee responses to influence the test calibration. The results derived by the NN, the combination of MCMC and NN (NN MCMC) and the combination of JMLE and NN are compared with that of the well-established Hierarchical MCMC procedure and JMLE with a uniform prior on the attribute profile to illustrate their strength and weakness.  相似文献   

5.
Bayes classification procedure for a group of independent vectors treated as a whole is considered. When the distributions are not specified, we obtain the bounds of the minimal sample size based on the Chernoff and the Bhattacharyya distances between the populations. The case of the normal distribution is also discussed.  相似文献   

6.
Power and Sample Size Computation for Wald Tests in Latent Class Models   总被引:1,自引:0,他引:1  
Latent class (LC) analysis is used by social, behavioral, and medical science researchers among others as a tool for clustering (or unsupervised classification) with categorical response variables, for analyzing the agreement between multiple raters, for evaluating the sensitivity and specificity of diagnostic tests in the absence of a gold standard, and for modeling heterogeneity in developmental trajectories. Despite the increased popularity of LC analysis, little is known about statistical power and required sample size in LC modeling. This paper shows how to perform power and sample size computations in LC models using Wald tests for the parameters describing association between the categorical latent variable and the response variables. Moreover, the design factors affecting the statistical power of these Wald tests are studied. More specifically, we show how design factors which are specific for LC analysis, such as the number of classes, the class proportions, and the number of response variables, affect the information matrix. The proposed power computation approach is illustrated using realistic scenarios for the design factors. A simulation study conducted to assess the performance of the proposed power analysis procedure shows that it performs well in all situations one may encounter in practice.  相似文献   

7.
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches.  相似文献   

8.
Cognitive diagnostic models provide valuable information on whether a student has mastered each of the attributes a test intends to evaluate. Despite its generality, the generalized DINA model allows for the possibility of lower correct rates for students who master more attributes than those who know less. This paper considers the use of order-constrained parameter space of the G-DINA model to avoid such a counter-intuitive phenomenon and proposes two algorithms, the upward and downward methods, for parameter estimation. Through simulation studies, we compare the accuracy in parameter estimation and in classification of attribute patterns obtained from the proposed two algorithms and the current approach when the restricted parameter space is true. Our results show that the upward method performs the best among the three, and therefore it is recommended for estimation, regardless of the distribution of respondents’ attribute patterns, types of test items, and the sample size of the data.  相似文献   

9.
A new projection-pursuit index is used to identify clusters and other structures in multivariate data. It is obtained from the variance decompositions of the data’s one-dimensional projections, without assuming a model for the data or that the number of clusters is known. The index is affine invariant and successful with real and simulated data. A general result is obtained indicating that clusters’ separation increases with the data’s dimension. In simulations it is thus confirmed, as expected, that the performance of the index either improves or does not deteriorate when the data’s dimension increases, making it especially useful for “large dimension-small sample size” data. The efficiency of this index will increase with the continuously improved computer technology. Several applications are presented.  相似文献   

10.
Given two dendrograms (rooted tree diagrams) which have some but not all of their base points in common, a supertree is a dendrogram from which each of the original trees can be regarded as samples The distinction is made between inconsistent and consistent sample trees, defined by whether or not the samples provide contradictory information about the supertree An algorithm for obtaining the strict consensus supertree of two consistent sample trees is presented, as are procedures for merging two inconsistent sample trees Some suggestions for future work are made  相似文献   

11.
Sometimes a larger dataset needs to be reduced to just a few points, and it is desirable that these points be representative of the whole dataset. If the future uses of these points are not fully specified in advance, standard decision-theoretic approaches will not work. We present here methodology for choosing a small representative sample based on a mixture modeling approach.  相似文献   

12.
In many statistical applications data are curves measured as functions of a continuous parameter as time. Despite of their functional nature and due to discrete-time observation, these type of data are usually analyzed with multivariate statistical methods that do not take into account the high correlation between observations of a single curve at nearby time points. Functional data analysis methodologies have been developed to solve these type of problems. In order to predict the class membership (multi-category response variable) associated to an observed curve (functional data), a functional generalized logit model is proposed. Base-line category logit formulations will be considered and their estimation based on basis expansions of the sample curves of the functional predictor and parameters. Functional principal component analysis will be used to get an accurate estimation of the functional parameters and to classify sample curves in the categories of the response variable. The good performance of the proposed methodology will be studied by developing an experimental study with simulated and real data.  相似文献   

13.
在对天地相对尺度的认识上,中国古人传统上认为地的大小可以与天相比拟.但东汉王充通过对在不同地点观察太阳出没时的大小及北极星方位的变化情况的分析,却明确提出了"地小居狭"的主张.王充的论证方法与托勒密是相通的,而思想根源则与其重视量的概念的思想方法有关,但他的观点长期被人们所忽略,未能对中国古代天文学的发展发挥作用.  相似文献   

14.
We investigate the effects of a complex sampling design on the estimation of mixture models. An approximate or pseudo likelihood approach is proposed to obtain consistent estimates of class-specific parameters when the sample arises from such a complex design. The effects of ignoring the sample design are demonstrated empirically in the context of an international value segmentation study in which a multinomial mixture model is applied to identify segment-level value rankings. The analysis reveals that ignoring the sample design results in both an incorrect number of segments as identified by information criteria and biased estimates of segment-level parameters.  相似文献   

15.
This series of papers is intended to present astrocladistics in some detail and evaluate this methodology in reconstructing phylogenies of galaxies. Being based on the evolution of all the characters describing galaxies, it is an objective way of understanding galaxy diversity through evolutionary relationships. In this first paper, we present the basic steps of a cladistic analysis and show both theoretically and practically that it can be applied to galaxies. For illustration, we use a sample of 50 simulated galaxies taken from the GALICS database, which are described by 91 observables (dynamics, masses and luminosities). These 50 simulated galaxies are indeed 10 different galaxies taken at 5 cosmological epochs, and they are free of merger events. The astrocladistic analysis easily reconstructs the true chronology of evolution relationships within this sample. It also demonstrates that burst characters are not relevant for galaxy evolution as a whole. A companion paper is devoted to the formalization of the concepts of formation and diversification in galaxy evolution.  相似文献   

16.
Data in an experimental array where a nominal dependent variable hasm>2 outcomes may be accounted for by one of a number of possible schemes consisting ofJ successive and/or parallel independentm i-nomial experiments where m i =m +J – 1. Each such scheme can be represented by a tree diagram which is presumed to be valid everywhere in the array. A criterion based on likelihood is defined to assess the different schemes. The set of outcome probabilities of a scheme is shown to differ from that of all other schemes almost everywhere in the space of parameters. As sample size increases, the probability of correctly inferring the true tree tends to 1. Using Monte-Carlo simulation of the four-outcome case, we illustrate, for small sample sizes, how this probability depends on the parameters.
Résumé Une famille de modèles est proposée pour analyser un ensemble de données dont les observations sont faites sur une variable réponse discrète et sur un vecteur explicatif. Chaque modèle est constitué d'une série d'expériences multinomiales dont les résultats sont des regroupements de modalités de la variable réponse. Les probabilités d'observer ces regroupements dépendent du vecteur explicatif selon des équations logistique-lin éaires. On prouve facilement que chaque modèle de cette famille contient le même nombre de paramètres. De plus chaque modèle correspond à une structure d'arbres qui classifie hiérarchiquement les modalités de la variable réponse: un noeud non terminal de l'arbre représente une de ces expériences multinomiales et un noeud terminal représente une modalité.

Ainsi la probabilité d'observer une de ces modalités est calculée en parcourant le chemin reliant la racine au noeud terminal représentant cette modalité et le choix du modèle est basé sur un critère de vraisemblance calculée comme le produit des vraisemblances évaluées à partir de l'ensemble de données pour chaque noeud non terminal de l'arbre. On démontre que la capacité de prédiction des modalités diffère pour chaque arbre et seul le vrai arbre peut exhiber les vraies probabilités sur presque tout l'espace paramétrique. On y démontre aussi des propriétés asymptotiques du critère qui assurent que le vrai modèle est choisi par ce critère avec probabilité 1. Une étude par simulation Monte-Carlo illustre, dans le cas de petits échantillons, la dépendance de la probabilité que le vrai modèle soit choisi sur les valeurs des paramètres.
  相似文献   

17.
医药化学学派初探   总被引:1,自引:0,他引:1  
该文追溯了医药化学学派产生的历史背景及其代表人物在化学上的重要贡献,分析了他们在思想方法和研究方法上的特点,阐明了医药化学在古代化学向近代化学过渡中所起的重要桥梁作用,并通过史实剖析了医药化学学派在近代化学的形成、确立和发展过程中所产生的广泛而浣的影响。  相似文献   

18.
清代三个时段钦天监所观测的流星记录研究   总被引:3,自引:1,他引:2  
本文讨论了清代流星记录的时间换算方法,并给出其太阳平黄经和拟合辐射点以寻找其群归属 。  相似文献   

19.
劳丹借助科学史对科学实在论的观点进行反驳,特别是通过一组曾经取得成功但无指称的理论列表来否认成功和近似真理之间的关联。为了使反驳有效,劳丹归纳出的列表必须具有统计学的意义,但同时又犯了统计学错误。刘易斯认为悲观归纳犯了假阳性错误,米兹拉西认为悲观归纳犯了样本错误。米兹拉西同样以科学史为依据提出了一种支持实在论观点的乐观归纳,乐观归纳能同时避免以上两种错误,比无奇迹论证更有力。  相似文献   

20.
学术主权的存在价值不仅在于获得学术共同体内部学术行动者的认同,还在于获得学术共同体外部学术行动者的认同,这种认同既有大小又有特定方向,我们将其称为认同向量;认同向量的指向和大小随着特定情境下学术行动者共同参与的学术实践而变化,并在特定条件下保持相对稳定;认同向量稳定与变化的辩证法决定着学术主权的确立与嬗变;理解学术主权的嬗变过程与机制对于认识学术研究演化规律具有理论和实践意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号