首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A comparison between two distance-based discriminant principles   总被引:1,自引:1,他引:0  
A distance-based classification procedure suggested by Matusita (1956) has long been available as an alternative to the usual Bayes decision rule. Unsatisfactory features of both approaches when applied to multinomial data led Goldstein and Dillon (1978) to propose a new distance-based principle for classification. We subject the Goldstein/Dillon principle to some theoretical scrutiny by deriving the population classification rules appropriate not only to multinomial data but also to multivariate normal and mixed multinomial/multinormal data. These rules demonstrate equivalence of the Goldstein/Dillon and Matusita approaches for the first two data types, and similar equivalence is conjectured (but not explicitly obtained) for the mixed data case. Implications for sample-based rules are noted.  相似文献   

2.
This paper presents asymmetric agglomerative hierarchical clustering algorithms in an extensive view point. First, we develop a new updating formula for these algorithms, proposing a general framework to incorporate many algorithms. Next we propose measures to evaluate the fit of asymmetric clustering results to data. Then we demonstrate numerical examples with real data, using the new updating formula and the indices of fit. Discussing empirical findings, through the demonstrative examples, we show new insights into the asymmetric clustering.  相似文献   

3.
We propose two algorithms for robust two-mode partitioning of a data matrix in the presence of outliers. First we extend the robust k-means procedure to the case of biclustering, then we slightly relax the definition of outlier and propose a more flexible and parsimonious strategy, which anyway is inherently less robust. We discuss the breakdown properties of the algorithms, and illustrate the methods with simulations and three real examples. The author is grateful to four referees for detailed suggestions that led to an improved paper, and to Professor Vichi for support and careful reading of a first draft. Acknowledgements go also to Francesca Martella for advice.  相似文献   

4.
We consider processes of emergence within the conceptual framework of the Information Loss principle and the concepts of (1) systems conserving information; (2) systems compressing information; and (3) systems amplifying information. We deal with the supposed incompatibility between emergence and computability tout-court. We distinguish between computational emergence, when computation acquires properties, and emergent computation, when computation emerges as a property. The focus is on emergence processes occurring within computational processes. Violations of Turing-computability such as non-explicitness and incompleteness are intended to represent partially the properties of phenomenological emergence, such as logical openness, given by the observer’s cognitive role; structural dynamics where change regards rules rather than only values; and multi-modelling where multiple non-equivalent models are required to model such structural dynamics. In this way, we validate, from an epistemological viewpoint, models and simulations of phenomenological emergence where the sequence of events constitutes the natural, analogical non-Turing computation which a cognitive complex system can reproduce through learning. Reproducibility through learning is different from Turing-like computational iteration. This paper aims to open a new, non-reductionist understanding of the conceptual relationship between emergence and computability.  相似文献   

5.
In this study, we consider the type of interval data summarizing the original samples (individuals) with classical point data. This type of interval data are termed interval symbolic data in a new research domain called, symbolic data analysis. Most of the existing research, such as the (centre, radius) and [lower boundary, upper boundary] representations, represent an interval using only the boundaries of the interval. However, these representations hold true only under the assumption that the individuals contained in the interval follow a uniform distribution. In practice, such representations may result in not only inconsistency with the facts, since the individuals are usually not uniformly distributed in many application aspects, but also information loss for not considering the point data within the intervals during the calculation. In this study, we propose a new representation of the interval symbolic data considering the point data contained in the intervals. Then we apply the city-block distance metric to the new representation and propose a dynamic clustering approach for interval symbolic data. A simulation experiment is conducted to evaluate the performance of our method. The results show that, when the individuals contained in the interval do not follow a uniform distribution, the proposed method significantly outperforms the Hausdorff and city-block distance based on traditional representation in the context of dynamic clustering. Finally, we give an application example on the automobile data set.  相似文献   

6.
Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation's worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC.  相似文献   

7.
In two-class discriminant problems, objects are allocated to one of the two classes by means of threshold rules based on discriminant functions. In this paper we propose to examine the quality of a discriminant functiong in terms of its performance curve. This curve is the plot of the two misclassification probabilities as the thresholdt assumes various real values. The role of such performance curves in evaluating and ordering discriminant functions and solving discriminant problems is presented. In particular, it is shown that: (i) the convexity of such a curve is a sufficient condition for optimal use of the information contained in the data reduced byg, and (ii)g with non-convex performance curve should be corrected by an explicitly obtained transformation.  相似文献   

8.
9.
We propose a development stemming from Roux (1988). The principle is progressively to modify the dissimilarities so that every quadruple satisfies not only the additive inequality, as in Roux's method, but also all triangle inequalities. Our method thus ensures that the results are tree distances even when the observed dissimilarities are nonmetric. The method relies on the analytic solution of the least-squares projection onto a tree distance of the dissimilarities attached to a single quadruple. This goal is achieved by using geometric reasoning which also enables an easy proof of algorithm's convergence. This proof is simpler and more complete than that of Roux (1988) and applies to other similar reduction methods based on local least-squares projection. The method is illustrated using Case's (1978) data. Finally, we provide a comparative study with simulated data and show that our method compares favorably with that of Studier and Keppler (1988) which follows in the ADDTREE tradition (Sattath and Tversky 1977). Moreover, this study seems to indicate that our method's results are generally close to the global optimum according to variance accounted for.We offer sincere thanks to Gilles Caraux, Bernard Fichet, Alain Guénoche, and Maurice Roux for helpful discussions, advice, and for reading the preliminary versions of this paper. We are grateful to three anonymous referees and to the editor for many insightful comments. This research was supported in part by the GREG and the IA2 network.  相似文献   

10.
《武成》历日解析   总被引:5,自引:0,他引:5  
全面分析月相术语、《武成》历日和武王伐纣年代的关系。以生霸望-死霸朔、生霸上弦-死霸下弦和生霸月初-死霸望后三种月相说为例,分析了《武成》月相历日。加以岁在鸠火、日在析木的条件,得到克商的首选日期分别在公元前1070年、公元前1071年和公元前1046年。文章得到的一系列中间结果可供历史学家在综合各方面信息时作参考。  相似文献   

11.
所谓“模板”是一种隐喻的说法,人们往往相信,创造需要给予完全的自由,只要突破任何框架的约束,就能增加产生创意的概率。然而,被告诫“打破规则”的人,并不一定比那些遵循规则的人更善于解决问题,因为,一个可供参考的认知框架将会增强人们对“游戏规则”的敏感度,基于模板的创意具有高度的创造性。选择力是隐藏于创造性背后的“忽略的智慧”。科学发现是模式识别与选择性搜索的协作过程,如何从“信息海洋”里捞出那根真正有意义的“针”,也许能从意向性中找到答案。另一方面,过量的信息导致了注意力的贫乏,在信息化社会的今天,注意力成为稀缺的资源。当新信息潮水般涌来的时候,如果没有将它有秩序地纳入意识,精神错乱就容易出现。创造性思想的形成同时伴随着思维的混沌,在这关键时刻出现的无序正是精神分裂的表现。而思维中的有序性和条理性是一种对混沌控制的结果,它来自思维中的自组织和不同层次间的相互调控。  相似文献   

12.
We propose a new nonparametric family of oscillation heuristics for improving linear classifiers in the two-group discriminant problem. The heuristics are motivated by the intuition that the classification accuracy of a separating hyperplane can be improved through small perturbations to its slope and position, accomplished by substituting training observations near the hyperplane for those used to generate it. In an extensive simulation study, using data generated from multivariate normal distributions under a variety of conditions, the oscillation heuristics consistently improve upon the classical linear and logistic discriminant functions, as well as two published linear programming-based heuristics and a linear Support Vector Machine. Added to any of the methods above, they approach, and frequently attain, the best possible accuracy on the training samples, as determined by a mixed-integer programming (MIP) model, at a much smaller computational cost. They also improve expected accuracy on the overall populations when the populations overlap significantly and the heuristics are trained with large samples, at least in situations where the data conditions do not explicitly favor a particular classifier.  相似文献   

13.
文章以土耳其语军事领域术语语言特征研究为基础,提出一种规则与统计相结合的术语抽取方法,先后通过关键词、停止词、形态分析序列模式、点互信息、左右信息熵和临接词缀等特征对单语文本中的候选项进行筛选,在W-data和N-data大小两组单语文本中进行实验,结果表明该方法能够有效地从实验数据中抽取土耳其语军事术语。  相似文献   

14.
Comparing partitions   总被引:80,自引:13,他引:67  
The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. We begin by reviewing a well-known measure of partition correspondence often attributed to Rand (1971), discuss the issue of correcting this index for chance, and note that a recent normalization strategy developed by Morey and Agresti (1984) and adopted by others (e.g., Miligan and Cooper 1985) is based on an incorrect assumption. Then, the general problem of comparing partitions is approached indirectly by assessing the congruence of two proximity matrices using a simple cross-product measure. They are generated from corresponding partitions using various scoring rules. Special cases derivable include traditionally familiar statistics and/or ones tailored to weight certain object pairs differentially. Finally, we propose a measure based on the comparison of object triples having the advantage of a probabilistic interpretation in addition to being corrected for chance (i.e., assuming a constant value under a reasonable null hypothesis) and bounded between ±1.William H.E. Day was Acting Editor for the reviewing of this paper. We are grateful to him, Ove Frank, Charles Lewis, Glenn W. Milligan, Ivo Molenaar, Stanley S. Wasserman, and anonymous referees for helpful suggestions. Lynn Bilger and Tom Sharpe provided competent technical assistance. Partial support of Phipps Arabie's participation in this research was provided by NSF Grant SES 8310866 and ONR Contract N00014-83-K-0733.  相似文献   

15.
We present an alternative approach to Multiple Correspondence Analysis (MCA) that is appropriate when the data consist of ordered categorical variables. MCA displays objects (individuals, units) and variables as individual points and sets of category points in a low-dimensional space. We propose a hybrid decomposition on the basis of the classical indicator super-matrix, using the singular value decomposition, and the bivariate moment decomposition by orthogonal polynomials. When compared to standard MCA, the hybrid decomposition will give the same representation of the categories of the variables, but additionally, we obtain a clear association interpretation among the categories in terms of linear, quadratic and higher order components. Moreover, the graphical display of the individual units will show an automatic clustering.  相似文献   

16.
In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.  相似文献   

17.
In this philosophical paper, we explore computational and biological analogies to address the fine-tuning problem in cosmology. We first clarify what it means for physical constants or initial conditions to be fine-tuned. We review important distinctions such as the dimensionless and dimensional physical constants, and the classification of constants proposed by Lévy-Leblond. Then we explore how two great analogies, computational and biological, can give new insights into our problem. This paper includes a preliminary study to examine the two analogies. Importantly, analogies are both useful and fundamental cognitive tools, but can also be misused or misinterpreted. The idea that our universe might be modelled as a computational entity is analysed, and we discuss the distinction between physical laws and initial conditions using algorithmic information theory. Smolin introduced the theory of “Cosmological Natural Selection” with a biological analogy in mind. We examine an extension of this analogy involving intelligent life. We discuss if and how this extension could be legitimated.  相似文献   

18.
The main aim of this work is the study of clustering dependent data by means of copula functions. Copulas are popular multivariate tools whose importance within clustering methods has not been investigated yet in detail. We propose a new algorithm (CoClust in brief) that allows to cluster dependent data according to the multivariate structure of the generating process without any assumption on the margins. Moreover, the approach does not require either to choose a starting classification or to set a priori the number of clusters; in fact, the CoClust selects them by using a criterion based on the log–likelihood of a copula fit. We test our proposal on simulated data for different dependence scenarios and compare it with a model–based clustering technique. Finally, we show applications of the CoClust to real microarray data of breast-cancer patients.  相似文献   

19.
A permutation-based algorithm for block clustering   总被引:2,自引:1,他引:1  
Hartigan (1972) discusses the direct clustering of a matrix of data into homogeneous blocks. He introduces a stepwise divisive method for block clustering within a certain class of block structures which induce clustering trees for both row and column margins. While this class of structures is appealing, the stopping criterion for his method, which is based on asymptotic theory and the assumption that the individual elements of the data matrix are normally distributed, is quite restrictive. In this paper we propose a permutation-based algorithm for block clustering within the same class of block structures. By using permutation arguments to decide where to split and when to stop, our algorithm becomes applicable in a wide variety of cases, including matrices of categorical data and matrices of small-to-moderate size. In addition, our algorithm offers considerable flexibility in how block homogeneity is defined. The algorithm is studied in a series of simulation experiments on matrices of known structure, and illustrated in examples drawn from the fields of taxonomy, political science, and data architecture.  相似文献   

20.
We provide examples of the extent and nature of environmental and human health problems and show why in the United States prevailing scientific and legal burden of proof requirements usually cannot be met because of the pervasiveness of scientific uncertainty. We also provide examples of how may assumptions, judgments, evaluations, and inferences in scientific methods are value-laden and that when this is not recognized results of studies will appear to be more factual and value-neutral than warranted. Further, we show that there is a "tension" between the use of the 95 percent confidence rule as a normative basis to reduce speculation in scientific knowledge and other public policy and moral concerns embodied by the adoption of a precautionary principle. Finally, although there is no precise agreement regarding what a precautionary principle might entail, we make several recommendations regarding the placement of the burden of proof and the standard of proof that ought to be required in environmental and human health matters. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号