首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The problem of measuring the impact of individual data points in a cluster analysis is examined. The purpose is to identify those data points that have an influence on the resulting cluster partitions. Influence of a single data point is considered present when different cluster partitions result from the removal of the element from the data set. The Hubert and Arabie (1985) corrected Rand index was used to provide numerical measures of influence of a data point. Simulated data sets consisting of a variety of cluster structures and error conditions were generated to validate the influence measures. The results showed that the measure of internal influence was 100% accurate in identifying those data elements exhibiting an influential effect. The nature of the influence, whether beneficial or detrimental to the clustering, can be evaluated with the use of the gamma and point-biserial statistics.  相似文献   

2.
朱伟 《自然辩证法研究》2006,22(6):33-36,45
自人类基因组计划开展后,以人群为基础的研究成为遗传学研究的一个热点。与以往的研究不同,人群遗传学的研究对象既是个人也是群体。这些群体以集体的形式承担研究可能的风险,并分享可能的利益。人群遗传学的研究对基于个人同意的知情同意原则提出了新的问题。本文将结合人群遗传学研究的新进展,指出征求受试人群的群体同意既具有现实的必要性,也具有伦理的合理性。个人同意与群体同意不存在本质的对立。  相似文献   

3.
We propose functional cluster analysis (FCA) for multidimensional functional data sets, utilizing orthonormalized Gaussian basis functions. An essential point in FCA is the use of orthonormal bases that yield the identity matrix for the integral of the product of any two bases. We construct orthonormalized Gaussian basis functions using Cholesky decomposition and derive a property of Cholesky decomposition with respect to Gram-Schmidt orthonormalization. The advantages of the functional clustering are that it can be applied to the data observed at different time points for each subject, and the functional structure behind the data can be captured by removing the measurement errors. Numerical experiments are conducted to investigate the effectiveness of the proposed method, as compared to conventional discrete cluster analysis. The proposed method is applied to three-dimensional (3D) protein structural data that determine the 3D arrangement of amino acids in individual protein.  相似文献   

4.
区域集群创新:一个基于生成式的分析框架   总被引:1,自引:0,他引:1  
区域集群创新是创新的一种新形式,它不同于传统上创新主体限于单个机构或个人,创新过程以单一线性为主的模式,有着更为复杂的实现机制。本文依创新本身的生成逻辑,构建了一个综合性分析框架。区域集群主体基于丰富的地理、社会和行业接近性所生成的互动网络是创新产生的基础,它纾解了主体间知识转移的障碍,使集群整体层面呈现出协同进化特征,而个体关系层面的竞争和合作催生了创新的“涨落”效应,由此推动了创新的不断生成演进。  相似文献   

5.
The mixture method of clustering applied to three-way data   总被引:3,自引:3,他引:0  
Clustering or classifying individuals into groups such that there is relative homogeneity within the groups and heterogeneity between the groups is a problem which has been considered for many years. Most available clustering techniques are applicable only to a two-way data set, where one of the modes is to be partitioned into groups on the basis of the other mode. Suppose, however, that the data set is three-way. Then what is needed is a multivariate technique which will cluster one of the modes on the basis of both of the other modes simultaneously. It is shown that by appropriate specification of the underlying model, the mixture maximum likelihood approach to clustering can be applied in the context of a three-way table. It is illustrated using a soybean data set which consists of multiattribute measurements on a number of genotypes each grown in several environments. Although the problem is set in the framework of clustering genotypes, the technique is applicable to other types of three-way data sets.  相似文献   

6.
Analysis of between-group differences using canonical variates assumes equality of population covariance matrices. Sometimes these matrices are sufficiently different for the null hypothesis of equality to be rejected, but there exist some common features which should be exploited in any analysis. The common principal component model is often suitable in such circumstances, and this model is shown to be appropriate in a practical example. Two methods for between-group analysis are proposed when this model replaces the equal dispersion matrix assumption. One method is by extension of the two-stage approach to canonical variate analysis using sequential principal component analyses as described by Campbell and Atchley (1981). The second method is by definition of a distance function between populations satisfying the common principal component model, followed by metric scaling of the resulting between-populations distance matrix. The two methods are compared with each other and with ordinary canonical variate analysis on the previously introduced data set.  相似文献   

7.
A low-dimensional representation of multivariate data is often sought when the individuals belong to a set ofa-priori groups and the objective is to highlight between-group variation relative to that within groups. If all the data are continuous then this objective can be achieved by means of canonical variate analysis, but no corresponding technique exists when the data are categorical or mixed continuous and categorical. On the other hand, if there is noa-priori grouping of the individuals, then ordination of any form of data can be achieved by use of metric scaling (principal coordinate analysis). In this paper we consider a simple extension of the latter approach to incorporate grouped data, and discuss to what extent this method can be viewed as a generalization of canonical variate analysis. Some illustrative examples are also provided.  相似文献   

8.
In this study, we consider the type of interval data summarizing the original samples (individuals) with classical point data. This type of interval data are termed interval symbolic data in a new research domain called, symbolic data analysis. Most of the existing research, such as the (centre, radius) and [lower boundary, upper boundary] representations, represent an interval using only the boundaries of the interval. However, these representations hold true only under the assumption that the individuals contained in the interval follow a uniform distribution. In practice, such representations may result in not only inconsistency with the facts, since the individuals are usually not uniformly distributed in many application aspects, but also information loss for not considering the point data within the intervals during the calculation. In this study, we propose a new representation of the interval symbolic data considering the point data contained in the intervals. Then we apply the city-block distance metric to the new representation and propose a dynamic clustering approach for interval symbolic data. A simulation experiment is conducted to evaluate the performance of our method. The results show that, when the individuals contained in the interval do not follow a uniform distribution, the proposed method significantly outperforms the Hausdorff and city-block distance based on traditional representation in the context of dynamic clustering. Finally, we give an application example on the automobile data set.  相似文献   

9.
社会突现论是基于系统论与心灵哲学等学科而发展起来的关于社会本质与社会现象解释的方法的一种新型整体主义理论。社会突现论认为,社会一方面由个体聚集突现而成,前者具有后者所不具有的特殊属性;另一方面,社会与个体属于不同的层次,前者不能化归为后者。社会突现论视域中的社会因果研究是以个体主义与整体主义之争为背景,对于社会层次是否具有因果效力的研究。社会突现论所关注的社会因果问题是某个社会事件是否能够作为不以个体意志为转移的因素对另一社会事件或社会中的个体产生影响。社会突现论认为,某个社会事件能够作为一个"独立"的因素影响其他的社会事件或社会中的个体。社会因果既与个体密切相关又不可化归为个体的属性。  相似文献   

10.
Generation of Random Clusters with Specified Degree of Separation   总被引:1,自引:1,他引:0  
We propose a random cluster generation algorithm that has the desired features: (1) the population degree of separation between clusters and the nearest neighboring clusters can be set to a specified value, based on a separation index; (2) no constraint is imposed on the isolation among clusters in each dimension; (3) the covariance matrices correspond to different shapes, diameters and orientations; (4) the full cluster structures generally could not be detected simply from pair-wise scatterplots of variables; (5) noisy variables and outliers can be imposed to make the cluster structures harder to be recovered. This algorithm is an improvement on the method used in Milligan (1985).  相似文献   

11.
The primary method for validating cluster analysis techniques is throughMonte Carlo simulations that rely on generating data with known cluster structure (e.g., Milligan 1996). This paper defines two kinds of data generation mechanisms with cluster overlap, marginal and joint; current cluster generation methods are framed within these definitions. An algorithm generating overlapping clusters based on shared densities from several different multivariate distributions is proposed and shown to lead to an easily understandable notion of cluster overlap. Besides outlining the advantages of generating clusters within this framework, a discussion is given of how the proposed data generation technique can be used to augment research into current classification techniques such as finite mixture modeling, classification algorithm robustness, and latent profile analysis.  相似文献   

12.
Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model.  相似文献   

13.
The rapid increase in the size of data sets makes clustering all the more important to capture and summarize the information, at the same time making clustering more difficult to accomplish. If model-based clustering is applied directly to a large data set, it can be too slow for practical application. A simple and common approach is to first cluster a random sample of moderate size, and then use the clustering model found in this way to classify the remainder of the objects. We show that, in its simplest form, this method may lead to unstable results. Our experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling method: several tentative models are identified from the sample instead of just one, and several EM steps are used rather than just one E step to classify the full data set. We find that there are significant gains from increasing the size of the sample up to about 2,000, but not from further increases. These conclusions are based on the application of several alternative strategies to the segmentation of three different multispectral images, and to several simulated data sets.  相似文献   

14.
In supervised learning, an important issue usually not taken into account by classical methods is that a class represented in the test set may have not been encountered earlier in the learning phase. Classical supervised algorithms will automatically label such observations as belonging to one of the known classes in the training set and will not be able to detect new classes. This work introduces a model-based discriminant analysis method, called adaptive mixture discriminant analysis (AMDA), which can detect several unobserved groups of points and can adapt the learned classifier to the new situation. Two EM-based procedures are proposed for parameter estimation and model selection criteria are used for selecting the actual number of classes. Experiments on artificial and real data demonstrate the ability of the proposed method to deal with complex and real-world problems. The proposed approach is also applied to the detection of unobserved communities in social network analysis.  相似文献   

15.
Separability of clusters is an issue that arises in many different areas, and is often used in a rather vague and subjective manner. We introduce a combinatorial notion of interiority to derive a global view on separability of a set of entities. We develop this approach further to evaluate the overall separability of a partition in the context of cluster analysis. Our approach captures combinatorial and geometrical aspects of data and provides, in addition to numerical evaluations, graphical representations particularly useful when data are not easily visualized. We illustrate the methodology on some real and simulated datasets.  相似文献   

16.
Free-sorting data are obtained when subjects are given a set of objects and are asked to divide them into subsets. Such data are usually reduced by counting for each pair of objects, how many subjects placed both of them into the same subset. The present study examines the utility of a group of additional statistics. the cooccurrences of sets of three objects. Because there are dependencies among the pair and triple cooccurrences, adjusted triple similarity statistics are developed. Multidimensional scaling and cluster analysis — which usually use pair similarities as their input data — can be modified to operate on three-way similarities to create representations of the set of objects. Such methods are applied to a set of empirical sorting data: Rosenberg and Kim's (1975) fifteen kinship terms.The author thanks Phipps Arabie, Lawrence Hubert, Lawrence Jones, Ed Shoben, and Stanley Wasserman for their considerable contributions to this paper.  相似文献   

17.
本文以分散农户家庭的农药施用为视角,以河南省为案例,分析了施药者农药施用的主要行为,并运用二元Logistic模型研究了影响农药施用行为的施药者的主要特征。研究表明,受教育年限对施药者农药施用行为影响显著,性别、家庭年收入和种植面积等其他特征对不同的农药施用阶段的施药者行为影响显著性各不相同。本文的研究显示,进一步深化农村改革,加快土地流转,增加对农产品安全生产的投入,健全农业技术推广体系,提升农业生产者素质,改善施药者的经济与社会特征在河南等农业大省已显得尤为迫切。  相似文献   

18.
19.
An error variance approach to two-mode hierarchical clustering   总被引:2,自引:2,他引:0  
A new agglomerative method is proposed for the simultaneous hierarchical clustering of row and column elements of a two-mode data matrix. The procedure yields a nested sequence of partitions of the union of two sets of entities (modes). A two-mode cluster is defined as the union of subsets of the respective modes. At each step of the agglomerative process, the algorithm merges those clusters whose fusion results in the smallest possible increase in an internal heterogeneity measure. This measure takes into account both the variance within the respective cluster and its centroid effect defined as the squared deviation of its mean from the maximum entry in the input matrix. The procedure optionally yields an overlapping cluster solution by assigning further row and/or column elements to clusters existing at a preselected hierarchical level. Applications to real data sets drawn from consumer research concerning brand-switching behavior and from personality research concerning the interaction of behaviors and situations demonstrate the efficacy of the method at revealing the underlying two-mode similarity structure.  相似文献   

20.
A sequential fitting procedure for linear data analysis models   总被引:1,自引:1,他引:0  
A particular factor analysis model with parameter constraints is generalized to include classification problems definable within a framework of fitting linear models. The sequential fitting (SEFIT) approach of principal component analysis is extended to include several nonstandard data analysis and classification tasks. SEFIT methods attempt to explain the variability in the initial data (commonly defined by a sum of squares) through an additive decomposition attributable to the various terms in the model. New methods are developed for both traditional and fuzzy clustering that have useful theoretic and computational properties (principal cluster analysis, additive clustering, and so on). Connections to several known classification strategies are also stated.The author is grateful to P. Arabie and L. J. Hubert for editorial assistance and reviewing going well beyond traditional levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号