首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting proportions of viewing time across television networks for an area sample of households.  相似文献   

2.
An index of goodness-of-fit based on noncentrality   总被引:4,自引:0,他引:4  
Akaike's Information Criterion is systematically dependent on sample size, and therefore cannot be used in practice as a basis for model selection. An alternative measure of goodness-of-fit, based like Akaike's on the noncentrality parameter, appears to be consistent over variations in sample size.  相似文献   

3.
The aim of the paper is to introduce some of the history and key concepts of network science to a philosophical audience, and to highlight a crucial—and often problematic—presumption that underlies the network approach to complex systems. Network scientists often talk of “the structure” of a given complex system or phenomenon, which encourages the view that there is a unique and privileged structure inherent to the system, and that the aim of a network model is to delineate this structure. I argue that this sort of naïve realism about structure is not a coherent or plausible position, especially given the multiplicity of types of entities and relations that can feature as nodes and links in complex networks.  相似文献   

4.
The aim of this paper is to argue that very often one is not able to distinguish between deterministic processes governed by some dynamical systems and stochastic processes. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

5.
Many methods and algorithms to generate random trees of many kinds have been proposed in the literature. No procedure exists however for the generation of dendrograms with randomized fusion levels. Randomized dendrograms can be obtained by randomizing the associated cophenetic matrix. Two algorithms are described. The first one generates completely random dendrograms, i.e., trees with a random topology, random fusion level values, and random assignment of the labels. The second algorithm uses a double-permutation procedure to randomize a given dendrogram; it proceeds by randomization of the fixed fusion levels, instead of using random fusion level values. A proof is presented that the double-permutation procedure is a Uniform Random Generation Algorithmsensu Furnas (1984), and a complete example is given. This work was supported by NSERC Grant No. A7738 to P. Legendre and by a NSERC scholarship to F.-J. Lapointe.  相似文献   

6.
The SINDCLUS algorithm for fitting the ADCLUS and INDCLUS models deals with a parameter matrix that occurs twice in the model by considering the two occurrences as independent parameter matrices. This procedure has been justified empirically by the observation that upon convergence of the algorithm to the global optimum, the two independently treated parameter matrices turn out to be equal. In the present paper, results are presented that contradict this finding, and a modification of SINDCLUS is presented which obviates the need for independently treating two occurrences of the same parameter matrix.  相似文献   

7.
A trend in educational testing is to go beyond unidimensional scoring and provide a more complete profile of skills that have been mastered and those that have not. To achieve this, cognitive diagnosis models have been developed that can be viewed as restricted latent class models. Diagnosis of class membership is the statistical objective of these models. As an alternative to latent class modeling, a nonparametric procedure is introduced that only requires specification of an item-by-attribute association matrix, and classifies according to minimizing a distance measure between observed responses, and the ideal response for a given attribute profile that would be implied by the item-by-attribute association matrix. This procedure requires no statistical parameter estimation, and can be used on a sample size as small as 1. Heuristic arguments are given for why the nonparametric procedure should be effective under various possible cognitive diagnosis models for data generation. Simulation studies compare classification rates with parametric models, and consider a variety of distance measures, data generation models, and the effects of model misspecification. A real data example is provided with an analysis of agreement between the nonparametric method and parametric approaches.  相似文献   

8.
网络思维:基于点线符号的认知图式和复杂性范式   总被引:2,自引:0,他引:2  
网络思维将事物结构视为由点(事物的组成要素)和线(要素间的联系)组成的网络,并在此基础上认识事物的结构、功能和演变。这种以网络为认知图式的思维方式已经在自然、经济和人文科学研究领域取得了丰硕的实践成果。网络已经成为二十一世纪的科学象征,而网络思维及其技术将为正在崛起的复杂性科学范式提供有助于弥合整体论和还原论之间鸿沟的认识工具和实践手段。  相似文献   

9.
In many areas of the eighteenth century was a starting point for the quantification of science. It was a period in which the mania for collecting led to the first attempts in systematization and classification. This penchant for collecting was not limited to natural history specimens or curiosities. Due in part to the development of mathematical and physical instruments, which became more widely available, scholars were confronted with the informative value of numbers. On the one hand, sequences of measurements appeared to be the key to the advancement of scientific knowledge, yet on the other hand the mathematical apparatus to deal with these data was still largely lacking. As a result of this the first meteorological networks organized in the eighteenth century all became bogged down in the large amount of information that was collected but could not be processed properly. This development is illustrated in a case study of an early Dutch meteorological society, the Natuur-en Geneeskundige Correspondentie Soci?teit (1779-1802). What were the factors that triggered this interest in the weather in the Netherlands? What were the goals and expectations of the contributors? What were their methodological strategies? Which instruments were used to measure which meteorological parameters? How was the stream of numbers generated by these measurements organized, collected and interpreted? An analysis of this process reveals that limits on the advancement of meteorology were not only imposed by eighteenth-century Dutch Republic and the lack of a proper theoretical insight were also crucial factors that eventually frustrated the breakthrough of meteorology as an academic science in the Netherlands. This breakthrough was only achieved in the second half of the nineteenth century.  相似文献   

10.
The distribution of lengths of phylogenetic trees under the taxonomic principle of parsimony is compared with the distribution obtained by randomizing the characters of the sequence data. This comparison allows us to define a measure of the extent to which sequence data contain significant hierarchical information. We show how to calculate this measure exactly for up to 10 taxa, and provide a good approximation for larger sets of taxa. The measure is applied to test sequences on 10 and 15 taxa.  相似文献   

11.
Dimensionality reduction techniques are used for representing higher dimensional data by a more parsimonious and meaningful lower dimensional structure. In this paper we will study two such approaches, namely Carroll’s Parametric Mapping (abbreviated PARAMAP) (Shepard and Carroll, 1966) and Tenenbaum’s Isometric Mapping (abbreviated Isomap) (Tenenbaum, de Silva, and Langford, 2000). The former relies on iterative minimization of a cost function while the latter applies classical MDS after a preprocessing step involving the use of a shortest path algorithm to define approximate geodesic distances. We will develop a measure of congruence based on preservation of local structure between the input data and the mapped low dimensional embedding, and compare the different approaches on various sets of data, including points located on the surface of a sphere, some data called the "Swiss Roll data", and truncated spheres.  相似文献   

12.
An error variance approach to two-mode hierarchical clustering   总被引:2,自引:2,他引:0  
A new agglomerative method is proposed for the simultaneous hierarchical clustering of row and column elements of a two-mode data matrix. The procedure yields a nested sequence of partitions of the union of two sets of entities (modes). A two-mode cluster is defined as the union of subsets of the respective modes. At each step of the agglomerative process, the algorithm merges those clusters whose fusion results in the smallest possible increase in an internal heterogeneity measure. This measure takes into account both the variance within the respective cluster and its centroid effect defined as the squared deviation of its mean from the maximum entry in the input matrix. The procedure optionally yields an overlapping cluster solution by assigning further row and/or column elements to clusters existing at a preselected hierarchical level. Applications to real data sets drawn from consumer research concerning brand-switching behavior and from personality research concerning the interaction of behaviors and situations demonstrate the efficacy of the method at revealing the underlying two-mode similarity structure.  相似文献   

13.
Over the past decade, diagnostic classification models (DCMs) have become an active area of psychometric research. Despite their use, the reliability of examinee estimates in DCM applications has seldom been reported. In this paper, a reliability measure for the categorical latent variables of DCMs is defined. Using theory-and simulation-based results, we show how DCMs uniformly provide greater examinee estimate reliability than IRT models for tests of the same length, a result that is a consequence of the smaller range of latent variable values examinee estimates can take in DCMs. We demonstrate this result by comparing DCM and IRT reliability for a series of models estimated with data from an end-of-grade test, culminating with a discussion of how DCMs can be used to change the character of large scale testing, either by shortening tests that measure examinees unidimensionally or by providing more reliable multidimensional measurement for tests of the same length.  相似文献   

14.
Comparing partitions   总被引:80,自引:13,他引:67  
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.  相似文献   

15.
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.  相似文献   

16.
试论月龄历谱的数理结构及编排规则   总被引:7,自引:1,他引:6  
运用数理统计方法,借鉴天文历算及历谱编排的有关知识,试图提供一种被称为月龄历谱的数理结构模型(数表),以作为对具有某些月龄特征的词语的研究工具。其用途:(1)由这些月龄词语的干支差直接得出它们可能的年代间隔;(2)可用于编排相应的月龄历谱(如金文历谱、某些甲骨文历谱等);(3)能对月龄历谱(如“夏商周断代工程”给出的西周金文历谱)方案中月龄词语的月龄范围及年代间隔进行误差分析。  相似文献   

17.
Lattice theory is used to develop techniques for classifying groups of subjects on the basis of their recall strategies or multiple recall strategies within individual subjects. Using the ordered tree algorithm to represent sets of recall orders, it is shown how both trees and single recall strings can be represented as points within a nonsemimodular, graded lattice. Distances within the lattice structure are used to construct a dissimilarity measure,S, which can then be used to partition the individual recall strings. The measureS between strings is compared to Kendall's tau in three empirical tests, examining differences between individual subjects, differences between groups of subjects, and differences within a subject. It was shown that onlyS could recover the original differences. Differences between comparing chunks versus comparing orders are discussed.The author would like to thank Henry Rueter, Judith Olson, John Jonides, and James Jaccard for many inspiring comments during several stages of this project, two anonymous reviewers for several important insights, and Malhee Lee for her assistance with data collection. This work was supported by NIMH Grant MH 39912. Portions of this work were presented at the annual meeting of the Classification Society in St. John's, Newfoundland, July 1985, and the annual meeting of the Society for Mathematical Psychology in Boston, MA, August 1986.  相似文献   

18.
基于2001至2010年中国知网(CNKI)收录的15种药学类核心期刊上的产学合著论文的统计研究和社会网络分析,对中国制药工业产学科技知识生产合作的基本表现和主要特征进行了研究。这期间,产学合著论文不断增长,合作网络规模快速增大,网络密度严重偏低且呈明显下降趋势,合著受地域限制明显且国际化偏低,部分一流研究型大学和重点医药集团的主导作用十分显著。总体上看,中国制药企业科技知识生产能力需要提高,与大学合作力度尚待增强。  相似文献   

19.
Feedforward neural networks are a popular tool for classification, offering a method for fully flexible modeling. This paper looks at the underlying probability model, so as to understand statistically what is going on in order to facilitate an intelligent choice of prior for a fully Bayesian analysis. The parameters turn out to be difficult or impossible to interpret, and yet a coherent prior requires a quantification of this inherent uncertainty. Several approaches are discussed, including flat priors, Jeffreys priors and reference priors.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号