首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
I consider a new problem of classification into n(n ≥ 2) disjoint classes based on features of unclassified data. It is assumed that the data are grouped into m(M ≥ n) disjoint sets and within each set the distribution of features is a mixture of distributions corresponding to particular classes. Moreover, the mixing proportions should be known and form a matrix of rank n. The idea of solution is, first, to estimate feature densities in all the groups, then to solve the linear system for component densities. The proposed classification method is asymptotically optimal, provided a consistent method of density estimation is used. For illustration, the method is applied to determining perfusion status in myocardial infarction patients, using creatine kinase measurements.  相似文献   

2.
K -means partitioning. We also describe some new features and improvements to the algorithm proposed by De Soete. Monte Carlo simulations have been conducted using different error conditions. In all cases (i.e., ultrametric or additive trees, or K-means partitioning), the simulation results indicate that the optimal weighting procedure should be used for analyzing data containing noisy variables that do not contribute relevant information to the classification structure. However, if the data involve error-perturbed variables that are relevant to the classification or outliers, it seems better to cluster or partition the entities by using variables with equal weights. A new computer program, OVW, which is available to researchers as freeware, implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a new algorithm for optimal variable weighting for K-means partitioning.  相似文献   

3.
Optimization Strategies for Two-Mode Partitioning   总被引:2,自引:2,他引:0  
Two-mode partitioning is a relatively new form of clustering that clusters both rows and columns of a data matrix. In this paper, we consider deterministic two-mode partitioning methods in which a criterion similar to k-means is optimized. A variety of optimization methods have been proposed for this type of problem. However, it is still unclear which method should be used, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode partitioning. Several known methods are discussed, and a new fuzzy steps method is introduced. The fuzzy steps method is based on the fuzzy c-means algorithm of Bezdek (1981) and the fuzzy steps approach of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performances of all methods are compared in a large simulation study. In our simulations, a two-mode k-means optimization method most often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode partitioning. We would like to thank two anonymous referees whose comments have improved the quality of this paper. We are also grateful to Peter Verhoef for providing the data set used in this paper.  相似文献   

4.
试论清代割圆连比例方法   总被引:1,自引:0,他引:1  
割圆连比例曾是清代无穷级数中所用的主要方法,但对它的研究上前还不够充分。该文试图它的概念和原理,并经过分析指出:它是以连比例的递加法为基础,具有一定约束条件的递归过程;它的结构特征取决于所设递加法是其诱导方程;它的初创阶段由于未能运用递加法,其结果归结为迭代过程。  相似文献   

5.
Numerical classification of proximity data with assignment measures   总被引:1,自引:1,他引:0  
An approach to numerical classification is described, which treats the assignment of objects to types as a continuous variable, called an assignment measure. Describing a classification by an assignment measure allows one not only to determine the types of objects, but also to see relationships among the objects of the same type and among the types themselves.A classification procedure, the Assignment-Prototype algorithm, is described and evaluated. It is a numerical technique for obtaining assignment measures directly from one-mode, two-way proximity matrices.  相似文献   

6.
X is the automatic hierarchical classification of one mode (units or variables or occasions) of X on the basis of the other two. In this paper the case of OMC of units according to variables and occasions is discussed. OMC is the synthesis of a set of hierarchical classifications Delta obtained from X; e.g., the OMC of units is the consensus (synthesis) among the set of dendograms individually defined by clustering units on the basis of variables, separately for each given occasion of X. However, because Delta is often formed by a large number of classifications, it may be unrealistic that a single synthesis is representative of the entire set. In this case, subsets of similar (homegeneous) dendograms may be found in Delta so that a consensus representative of each subset may be identified. This paper proposes, PARtition and Least Squares Consensus cLassifications Analysis (PARLSCLA) of a set of r hierarchical classifications Delta. PARLSCLA identifies the best least-squares partition of Delta into m (1 <= m <= r) subsets of homogeneous dendograms and simultaneously detects the closest consensus classification (a median classification called Least Squares Consensus Dendogram (LSCD) for each subset. PARLSCLA is a generalization of the problem to find a least-squares consensus dendogram for Delta. PARLSCLA is formalized as a mixed-integer programming problem and solved with an iterative, two-step algorithm. The method proposed is applied to an empirical data set.  相似文献   

7.
The more ways there are of understanding a clustering technique, the more effectively the results can be analyzed and used. I will give a general procedure, calledparameter modification, to obtain from a clustering criterion a variety of equivalent forms of the criterion. These alternative forms reveal aspects of the technique that are not necessarily apparent in the original formulation. This procedure is successful in improving the understanding of a significant number of clustering techniques.The insight obtained will be illustrated by applying parameter modification to partitioning, mixture and fuzzy clustering methods, resulting in a unified approach to the study of these methods and a general algorithm for optimizing them.The author wishes to thank Professor Doctor Hans-Hermann Bock for many stimulating discussions.  相似文献   

8.
This paper presents the development of a new methodology which simultaneously estimates in a least-squares fashion both an ultrametric tree and respective variable weightings for profile data that have been converted into (weighted) Euclidean distances. We first review the relevant classification literature on this topic. The new methodology is presented including the alternating least-squares algorithm used to estimate the parameters. The method is applied to a synthetic data set with known structure as a test of its operation. An application of this new methodology to ethnic group rating data is also discussed. Finally, extensions of the procedure to model additive, multiple, and three-way trees are mentioned.The first author is supported as Bevoegdverklaard Navorser of the Belgian Nationaal Fonds voor Wetenschappelijk Onderzoek.  相似文献   

9.
Starting from the problem of missing data in surveys with Likert-type scales, the aim of this paper is to evaluate a possible improvement for the imputation procedure proposed by Lavori, Dawson, and Shera (1995) here called Approximate Bayesian bootstrap with Propensity score (ABP). We propose an imputation procedure named Approximate Bayesian bootstrap with Propensity score and Nearest neighbour (ABPN), which, after the ??propensity score step?? of ABP, randomly selects a donor in the nonrespondent??s neighbourhood, which includes cases with response patterns similar to the one of the nonrespondent to be imputed. A preliminary simulation study with single imputation on missing data in two Likerttype scales from a real data set shows that ABPN: (a) performed better than the ABP imputation, and (b) can be considered as a serious competitor of other procedures used in this context.  相似文献   

10.
We present a hierarchical classification based on n-ary relations of the entities. Starting from the finest partition that can be obtained from the attributes, we distinguish between entities having the same attributes by using relations between entities. The classification that we get is thus a refinement of this finest partition. It can be computed in O(n + m 2) space and O(n · p · m 5/2) time, where n is the number of entities, p the number of classes of the resulting hierarchy (p is the size of the output; p < 2n) and m the maximum number of relations an entity can have (usually, m ? n). So we can treat sets with millions of entities.  相似文献   

11.
This paper introduces a novel mixture model-based approach to the simultaneous clustering and optimal segmentation of functional data, which are curves presenting regime changes. The proposed model consists of a finite mixture of piecewise polynomial regression models. Each piecewise polynomial regression model is associated with a cluster, and within each cluster, each piecewise polynomial component is associated with a regime (i.e., a segment). We derive two approaches to learning the model parameters: the first is an estimation approach which maximizes the observed-data likelihood via a dedicated expectation-maximization (EM) algorithm, then yielding a fuzzy partition of the curves into K clusters obtained at convergence by maximizing the posterior cluster probabilities. The second is a classification approach and optimizes a specific classification likelihood criterion through a dedicated classification expectation-maximization (CEM) algorithm. The optimal curve segmentation is performed by using dynamic programming. In the classification approach, both the curve clustering and the optimal segmentation are performed simultaneously as the CEM learning proceeds. We show that the classification approach is a probabilistic version generalizing the deterministic K-means-like algorithm proposed in Hébrail, Hugueney, Lechevallier, and Rossi (2010). The proposed approach is evaluated using simulated curves and real-world curves. Comparisons with alternatives including regression mixture models and the K-means-like algorithm for piecewise regression demonstrate the effectiveness of the proposed approach.  相似文献   

12.
Classical unidimensional scaling provides a difficult combinatorial task. A procedure formulated as a nonlinear programming (NLP) model is proposed to solve this problem. The new method can be implemented with standard mathematical programming software. Unlike the traditional procedures that minimize either the sum of squared error (L 2 norm) or the sum pf absolute error (L 1 norm), the proposed method can minimize the error based on any L p norm for 1 ≤p < ∞. Extensions of the NLP formulation to address a multidimensional scaling problem under the city-block model are also discussed.  相似文献   

13.
Suppose y, a d-dimensional (d ≥ 1) vector, is drawn from a mixture of k (k ≥ 2) populations, given by ∏1, ∏2,…,∏ k . We wish to identify the population that is the most likely source of the point y. To solve this classification problem many classification rules have been proposed in the literature. In this study, a new nonparametric classifier based on the transvariation probabilities of data depth is proposed. We compare the performance of the newly proposed nonparametric classifier with classical and maximum depth classifiers using some benchmark and simulated data sets. The authors thank the editor and referees for comments that led to an improvement of this paper. This work is partially supported by the National Science Foundation under Grant No. DMS-0604726. Published online xx, xx, xxxx.  相似文献   

14.
A Binary Integer Program to Maximize the Agreement Between Partitions   总被引:1,自引:1,他引:0  
This research note focuses on a problem where the cluster sizes for two partitions of the same object set are assumed known; however, the actual assignments of objects to clusters are unknown for one or both partitions. The objective is to find a contingency table that produces maximum possible agreement between the two partitions, subject to constraints that the row and column marginal frequencies for the table correspond exactly to the cluster sizes for the partitions. This problem was described by H. Messatfa (Journal of Classification, 1992, pp. 5–15), who provided a heuristic procedure based on the linear transportation problem. We present an exact solution procedure using binary integer programming. We demonstrate that our proposed method efficiently obtains optimal solutions for problems of practical size. We would like to thank the Editor, Willem Heiser, and an anonymous reviewer for helpful comments that resulted in improvements of this article.  相似文献   

15.
k-Adic formulations (for groups of objects of size k) of a variety of 2-adic similarity coefficients (for pairs of objects) for binary (presence/absence) data are presented. The formulations are not functions of 2-adic similarity coefficients. Instead, the main objective of the the paper is to present k-adic formulations that reflect certain basic characteristics of, and have a similar interpretation as, their 2-adic versions. Two major classes are distinguished. The first class is referred to as Bennani-Heiser similarity coefficients, which contains all coefficients that can be defined using just the matches, the number of attributes that are present and that are absent in k objects, and the total number of attributes. The coefficients in the second class can be formulated as functions of Dice’s association indices. The author thanks Willem Heiser and three anonymous reviewers for their helpful comments and valuable suggestions on earlier versions of this article.  相似文献   

16.
Classification and spatial methods can be used in conjunction to represent the individual information of similar preferences by means of groups. In the context of latent class models and using Simulated Annealing, the cluster-unfolding model for two-way two-mode preference rating data has been shown to be superior to a two-step approach of first deriving the clusters and then unfolding the classes. However, the high computational cost makes the procedure only suitable for small or medium-sized data sets, and the hypothesis of independent and normally distributed preference data may also be too restrictive in many practical situations. Therefore, an alternating least squares procedure is proposed, in which the individuals and the objects are partitioned into clusters, while at the same time the cluster centers are represented by unfolding. An enhanced Simulated Annealing algorithm in the least squares framework is also proposed in order to address the local optimum problem. Real and artificial data sets are analyzed to illustrate the performance of the model.  相似文献   

17.
汉代石氏星官研究   总被引:1,自引:2,他引:1  
该文首次利用傅里叶分析法对《石氏星经》星表的观测年代进行了研究,认为星表观测年代为公元前78年。并在此基础上复原了汉代的石氏星官,为进一步研究汉代星空提供了一个基础。  相似文献   

18.
随着中医的国际化发展,中医术语的英译研究日益得到学术界重视,具有特殊性的中医语言包含了大量的隐喻性术语。文章以中国知网(CNKI)所收录的文献为研究对象,选取多个相近关键词,对1999—2019年间所收录的中医术语隐喻性英译研究方向的论文进行可视化分析,从总体趋势、期刊收录分布、研究领域三个方面分别进行分析,并提出思考建议。  相似文献   

19.
Discussions concerning belief revision, theorydevelopment, and ``creativity' in philosophy andAI, reveal a growing interest in Peirce'sconcept of abduction. Peirce introducedabduction in an attempt to providetheoretical dignity and clarification to thedifficult problem of knowledge generation. Hewrote that ``An Abduction is Originary inrespect to being the only kind of argumentwhich starts a new idea' (Peirce, CP 2.26).These discussions, however, led to considerabledebates about the precise way in which Peirce'sabduction can be used to explain knowledgegeneration (cf. Magnani, 1999; Hoffmann, 1999).The crucial question is that of understandinghow we can get the new elements capableof enlarging our theories. Under thesecircumstances, it might be helpful to step outof the entanglement and reconsider the basis ofthe problem that originally triggered Peirce'sinterest in abduction. This will lead us toanother Peircean concept, that of ``diagrammaticreasoning,' which I discuss here in the contextof his ``pragmatism.' In this way, I hope toreach a better understanding of thecontribution of ``abduction' to the knowledgegeneration process.  相似文献   

20.
A latent class vector model for preference ratings   总被引:1,自引:1,他引:1  
A latent class formulation of the well-known vector model for preference data is presented. Assuming preference ratings as input data, the model simultaneously clusters the subjects into a small number of homogeneous groups (or latent classes) and constructs a joint geometric representation of the choice objects and the latent classes according to a vector model. The distributional assumptions on which the latent class approach is based are analogous to the distributional assumptions that are consistent with the common practice of fitting the vector model to preference data by least squares methods. An EM algorithm for fitting the latent class vector model is described as well as a procedure for selecting the appropriate number of classes and the appropriate number of dimensions. Some illustrative applications of the latent class vector model are presented and some possible extensions are discussed. Geert De Soete is supported as “Bevoegdverklaard Navorser” of the Belgian “Nationaal Fonds voor Wetenschappelijk Onderzoek.”  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号