首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The primary method for validating cluster analysis techniques is throughMonte Carlo simulations that rely on generating data with known cluster structure (e.g., Milligan 1996). This paper defines two kinds of data generation mechanisms with cluster overlap, marginal and joint; current cluster generation methods are framed within these definitions. An algorithm generating overlapping clusters based on shared densities from several different multivariate distributions is proposed and shown to lead to an easily understandable notion of cluster overlap. Besides outlining the advantages of generating clusters within this framework, a discussion is given of how the proposed data generation technique can be used to augment research into current classification techniques such as finite mixture modeling, classification algorithm robustness, and latent profile analysis.  相似文献   

2.
We describe a new wavelet transform, for use on hierarchies or binary rooted trees. The theoretical framework of this approach to data analysis is described. Case studies are used to further exemplify this approach. A first set of application studies deals with data array smoothing, or filtering. A second set of application studies relates to hierarchical tree condensation. Finally, a third study explores the wavelet decomposition, and the reproducibility of data sets such as text, including a new perspective on the generation or computability of such data objects.  相似文献   

3.
A low-dimensional representation of multivariate data is often sought when the individuals belong to a set ofa-priori groups and the objective is to highlight between-group variation relative to that within groups. If all the data are continuous then this objective can be achieved by means of canonical variate analysis, but no corresponding technique exists when the data are categorical or mixed continuous and categorical. On the other hand, if there is noa-priori grouping of the individuals, then ordination of any form of data can be achieved by use of metric scaling (principal coordinate analysis). In this paper we consider a simple extension of the latter approach to incorporate grouped data, and discuss to what extent this method can be viewed as a generalization of canonical variate analysis. Some illustrative examples are also provided.  相似文献   

4.
The Practice of Cluster Analysis   总被引:2,自引:2,他引:0  
Cluster analysis is one of the main methodologies for analyzing multivariate data. Its use is widespread and growing rapidly. The goal of this article is to document this growth, characterize current usage, illustrate the breadth of applications via examples, highlight both good and risky practices, and suggest some research priorities.  相似文献   

5.
Several techniques are given for the uniform generation of trees for use in Monte Carlo studies of clustering and tree representations. First, general strategies are reviewed for random selection from a set of combinatorial objects with special emphasis on two that use random mapping operations. Theorems are given on how the number of such objects in the set (e.g., whether the number is prime) affects which strategies can be used. Based on these results, methods are presented for the random generation of six types of binary unordered trees. Three types of labeling and both rooted and unrooted forms are considered. Presentation of each method includes the theory of the method, the generation algorithm, an analysis of its computational complexity and comments on the distribution of trees over which it samples. Formal proofs and detailed algorithms are in appendices.  相似文献   

6.
An approach is presented for analyzing a heterogeneous set of categorical variables assumed to form a limited number of homogeneous subsets. The variables generate a particular set of proximities between the objects in the data matrix, and the objective of the analysis is to represent the objects in lowdimensional Euclidean spaces, where the distances approximate these proximities. A least squares loss function is minimized that involves three major components: a) the partitioning of the heterogeneous variables into homogeneous subsets; b) the optimal quantification of the categories of the variables, and c) the representation of the objects through multiple multidimensional scaling tasks performed simultaneously. An important aspect from an algorithmic point of view is in the use of majorization. The use of the procedure is demonstrated by a typical example of possible application, i.e., the analysis of categorical data obtained in a free-sort task. The results of points of view analysis are contrasted with a standard homogeneity analysis, and the stability is studied through a Jackknife analysis.  相似文献   

7.
The additive biclustering model for two-way two-mode object by variable data implies overlapping clusterings of both the objects and the variables together with a weight for each bicluster (i.e., a pair of an object and a variable cluster). In the data analysis, an additive biclustering model is fitted to given data by means of minimizing a least squares loss function. To this end, two alternating least squares algorithms (ALS) may be used: (1) PENCLUS, and (2) Baier’s ALS approach. However, both algorithms suffer from some inherent limitations, which may hamper their performance. As a way out, based on theoretical results regarding optimally designing ALS algorithms, in this paper a new ALS algorithm will be presented. In a simulation study this algorithm will be shown to outperform the existing ALS approaches.  相似文献   

8.
Trees, and particularly binary trees, appear frequently in the classification literature. When studying the properties of the procedures that fit trees to sets of data, direct analysis can be too difficult, and Monte Carlo simulations may be necessary, requiring the implementation of algorithms for the generation of certain families of trees at random. In the present paper we use the properties of Prufer's enumeration of the set of completely labeled trees to obtain algorithms for the generation of completely labeled, as well as terminally labeled t-ary (and in particular binary) trees at random, i.e., with uniform distribution. Actually, these algorithms are general in that they can be used to generate random trees from any family that can be characterized in terms of the node degrees. The algorithms presented here are as fast as (in the case of terminally labeled trees) or faster than (in the case of completely labeled trees) any other existing procedure, and the memory requirements are minimal. Another advantage over existing algorithms is that there is no need to store pre-calculated tables.  相似文献   

9.
Comparing partitions   总被引:80,自引:13,他引:67  
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.  相似文献   

10.
本文介绍生命信息学研究的若干成果,包括信息二重性理论、生命信息进化论以及宇宙信息四雏模型等;并以此作为基础,提出一种在不违背热力学第二定律的条件下理解生命的方式。作者把“熵增原理”移植到信息学,把薛定谔说的“吃进负熵”推广到“学进负熵”。于是,两类“熵增”指向生命的死亡和毁灭,而两类“熵减”则指向有机体繁殖和生命进化;两种倾向的竞争演绎出丰富多彩的生命现象。  相似文献   

11.
The location model is a useful tool in parametric analysis of mixed continuous and categorical variables. In this model, the continuous variables are assumed to follow different multivariate normal distributions for each possible combination of categorical variable values. Using this model, a distance between two populations involving mixed variables can be defined. To date, however, no distributional results have been available, against which to assess the outcomes of practical applications of this distance. The null distribution of estimated distance is therefore considered in this paper, for a range of possible situations. No explicit analytical expressions are derived for this distribution, but easily implementable Monte Carlo schemes are described. These are then applied to previously cited examples.  相似文献   

12.
13.
术语对于一个学科的发展具有重要意义。通过追溯expertise及相关认知心理学术语的源起,分析其现有汉语译名的不足,认为外来术语汉译应遵循四项原则:准确性、专业性、可读性和一致性。据此,对该术语及相关术语的译名进行商榷论证,进而探讨四原则的内涵与相互关系。  相似文献   

14.
This paper addresses the theoretical notion of a game as it arisesacross scientific inquiries, exploring its uses as a technical andformal asset in logic and science versus an explanatory mechanism. Whilegames comprise a widely used method in a broad intellectual realm(including, but not limited to, philosophy, logic, mathematics,cognitive science, artificial intelligence, computation, linguistics,physics, economics), each discipline advocates its own methodology and aunified understanding is lacking. In the first part of this paper, anumber of game theories in formal studies are critically surveyed. Inthe second part, the doctrine of games as explanations for logic isassessed, and the relevance of a conceptual analysis of games tocognition discussed. It is suggested that the notion of evolution playsa part in the game-theoretic concept of meaning.  相似文献   

15.
The weighted linear choice model is one of the most popular models in the social sciences. In this model the utility of a choice object is represented as a weighted sum of attribute-level desirabilities, where the weights are attribute importances. In many empirical contexts the choice objects are such that individuals are highly correlated in terms of their desirability ordering of levels within attribute (e.g., price levels, durability levels, etc.) but may differ appreciably in terms of their evaluations of each attribute's importance.In this paper we address the problem of how dissimilar two individuals may be, in a rank correlation sense, given that they agree completely on the desirability ordering of levels within attributes, but may disagree considerably regarding the importance they attach to the attributes themselves. The problem has interesting implications regarding the potential value of clustering individuals' utility functions for market segmentation or other such purposes.The authors would like to thank the editor and three anonymous reviewers for their excellent comments on an earlier draft of the paper.  相似文献   

16.
术语对于一个学科的发展具有重要意义。通过追溯“expertise”及相关认知心理学术语的源起,分析其现有汉语译名的不足,认为外来术语汉译应遵循四项原则:准确性、专业性、可读性和一致性。据此,对该术语及相关术语的译名进行商榷论证,进而探讨四原则的内涵与相互关系。  相似文献   

17.
王少爽 《中国科技术语》2011,13(1):25-29,38
术语对于一个学科的发展具有重要意义.通过追溯"expertise"及相关认知心理学术语的源起,分析其现有汉语译名的不足,认为外来术语汉译应遵循四项原则:准确性、专业性、可读性和一致性.据此,对该术语及相关术语的译名进行商榷论证,进而探讨四原则的内涵与相互关系.  相似文献   

18.
基于复杂性维度,本文把决策系统划分为简单决策和复杂决策两种系统;比较了两种系统中的决策在思维模式、理论背景、决策概念、研究范式、研究方法论、决策方法,以及理论适应的范围等方面的相互区别;通过比较分析,综合出"复杂决策与简单决策两种系统的本质差异","新研究范式"、"方法论"和"决策概念本身的演化"三个方面对理解"复杂决策"所具有的启发意义.  相似文献   

19.
In many application fields, multivariate approaches that simultaneously consider the correlation between responses are needed. The tree method can be extended to multivariate responses, such as repeated measure and longitudinal data, by modifying the split function so as to accommodate multiple responses. Recently, researchers have constructed some decision trees for multiple continuous longitudinal response and multiple binary responses using Mahalanobis distance and a generalized entropy index. However, these methods have limitations according to the type of response, that is, those that are only continuous or binary. In this paper, we will modify the tree for univariate response procedure and suggest a new tree-based method that can analyze any type of multiple responses by using GEE (generalized estimating equations) techniques. To compare the performance of trees, simulation studies on selection probability of true split variable will be shown. Finally, applications using epileptic seizure data and WWW data are introduced.  相似文献   

20.
社会计算——科学、技术与人文的数字化动态交融   总被引:10,自引:0,他引:10  
本文通过社会计算的研究,探讨了科学、技术和人文有机组合的途径。主要内容是讨论如何利用复杂系统理论及先进的计算手段和方法,把传统上限于语言层次和静态的人文知识数字化和动态化,并用于各种复杂社会问题的建模、分析和决策支持。主要思想包括利用人工系统、计算试验和平行系统等方法,建立社会计算的理论框架。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号