首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Single linkage clusters on a set of points are the maximal connected sets in a graph constructed by connecting all points closer than a given threshold distance. The complete set of single linkage clusters is obtained from all the graphs constructed using different threshold distances. The set of clusters forms a hierarchical tree, in which each non-singleton cluster divides into two or more subclusters; the runt size for each single linkage cluster is the number of points in its smallest subcluster. The maximum runt size over all single linkage clusters is our proposed test statistic for assessing multimodality. We give significance levels of the test for two null hypotheses, and consider its power against some bimodal alternatives. Research partially supported by NSF Grant No. DMS-8617919.  相似文献   

2.
Several techniques are given for the uniform generation of trees for use in Monte Carlo studies of clustering and tree representations. First, general strategies are reviewed for random selection from a set of combinatorial objects with special emphasis on two that use random mapping operations. Theorems are given on how the number of such objects in the set (e.g., whether the number is prime) affects which strategies can be used. Based on these results, methods are presented for the random generation of six types of binary unordered trees. Three types of labeling and both rooted and unrooted forms are considered. Presentation of each method includes the theory of the method, the generation algorithm, an analysis of its computational complexity and comments on the distribution of trees over which it samples. Formal proofs and detailed algorithms are in appendices.  相似文献   

3.
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.  相似文献   

4.
In taxonomy and other branches of classification it is useful to know when tree-like classifications on overlapping sets of labels can be consistently combined into a parent tree. This paper considers the computation complexity of this problem. Recognizing when a consistent parent tree exists is shown to be intractable (NP-complete) for sets of unrooted trees, even when each tree in the set classifies just four labels. Consequently determining the compatibility of qualitative characters and partial binary characters is, in general, also NP-complete. However for sets of rooted trees an algorithm is described which constructs the “strict consensus tree” of all consistent parent trees (when they exist) in polynomial time. The related question of recognizing when a set of subtrees uniquely defines a parent tree is also considered, and a simple necessary and sufficient condition is described for rooted trees. This work was supproted by the Alexander von Humoldt-Stiftung. I wish to thank Andreas Dress, Hans-Jürgen Bandelt and the referees for their helpful comments.  相似文献   

5.
拉姆齐测验描述了人们决定接受p后,是不是接受q,从而对他的其它信念作最小修正的问题。20世纪60年代后,学界对这种思想进行了不同的解释,从总体上看,主要有两种:简单的拉姆齐测验和精致的拉姆齐测验。在分析这两种观点的基础上,我们认为这两种解释都存在一定的问题,在本文中,我们提出了一个基于拉姆齐测验的新的逻辑解释思路。  相似文献   

6.
Dendrograms used in data analysis are ultrametric spaces, hence objects of nonarchimedean geometry. It is known that there exist p-adic representations of dendrograms. Completed by a point at infinity, they can be viewed as subtrees of the Bruhat-Tits tree associated to the p-adic projective line. The implications are that certain moduli spaces known in algebraic geometry are in fact p-adic parameter spaces of dendrograms, and stochastic classification can also be handled within this framework. At the end, we calculate the topology of the hidden part of a dendrogram.  相似文献   

7.
Spectral analysis of phylogenetic data   总被引:12,自引:0,他引:12  
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences, the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum. A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa and perhaps for up to 30 taxa. The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted. The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data are new.  相似文献   

8.
有一些汉字,例如醣、朊和甾,在学术上有特定的含义,简易明了,曾经使用,目前被禁用了,但在《新华字典》上仍保留。建议恢复使用这些学术术语中特有的汉字。推而广之,一些特有的汉字在学术术语中应有一席之地。  相似文献   

9.
英语缩合词是由两个或多个名词的各自一部分相互组合形成的词;结构形式可以是字母或是语素。其中,首字母缩合词被广泛使用。中文没有缩合词,而有缩略词,中文缩略词是由词组词省略部分汉字的压缩形式,是来源词的同名异表形式。中文缩略词较英语缩合词有许多优点。英语缩合词有一定局限性,在中文著作和媒体报道中应尽量不用或少用英语缩合词和缩略词,宜将其翻译成适当的中文缩略词。  相似文献   

10.
Two properties of tree metrics are already known in the literature: tree metrics on a setX withn elements have 2n?3 degrees of freedom; a tree metric has Robinson form with regard to its minimum spanning tree (MST), or to any such MST if several of them exist. Starting from these results, we prove that a tree metrict is entirely defined by its restriction to some setB of 2n?3 entries. This set is easily determined from the table oft and includes then?1 entries of an MST. A fast method for the adjustment of a tree metric to any given metricd is then obtained. This method extends to dissimilarities.  相似文献   

11.
Classifications are generally pictured in the form of hierarchical trees, also called dendrograms. A dendrogram is the graphical representation of an ultrametric (=cophenetic) matrix; so dendrograms can be compared to one another by comparing their cophenetic matrices. Three methods used in testing the correlation between matrices corresponding to dendrograms are evaluated. The three permutational procedures make use of different aspects of the information to compare dendrograms: the Mantel procedure permutes label positions only; the binary tree methods randomize the topology as well; the double-permutation procedure is based on all the information included in a dendrogram, that is: topology, label positions, and cluster heights. Theoretical and empirical investigations of these methods are carried out to evaluate their relative performance. Simulations show that the Mantel test is too conservative when applied to the comparison of dendrograms; the methods of binary tree comparisons do slightly better; only the doublepermutation test provides unbiased type I error. Les arbres utilisés pour illustrés les groupements sont généralement représentés sous la forme de classifications hiérarchiques ou dendrogrammes. Un dendrogramme représente graphiquement l’information contenue dans la matrice ultramétrique (=cophénétique) correspondant à la classification. Dès ultramétriques correspondantes. Nous comparons trois méthodes permettant d’évaluer la signification statistique du coefficient de correlation mesuré entre deux matrices ultramétriques. Ces trois tests par permutations tiennent compte d’aspects différents pour comparer des dendrogrammes: le test de Mantel permute les feuilles de l’arbre, les méthodes pour arbres binaires permutent les feuilles et la topologie, alors que la procédure à double permutation permute les feuilles, la topologie et les niveaux de fusion des dendrogrammes comparés. L’efficacité relative des trois méthodes est évaluée empiriquement et théoriquement. Nos résultats suggèrent l’utilisation préférentielle du test à double permutation pour la comparaison de dendrogrammes: le test de Mantel s’avère trop conservateur, tandis que les méthodes pour arbres binaires ne sont pas toujours adéquates.
This work was supported by NSERC grant no. A7738 to Pierre Legendre and by a NSERC scholarship to F.-J. Lapointe.  相似文献   

12.
In this paper we show how biplot methodology can be combined with various forms of discriminant analyses leading to highly informative visual displays of the respective class separations. It is demonstrated that the concept of distance as applied to discriminant analysis provides a unified approach to a wide variety of discriminant analysis procedures that can be accommodated by just changing to an appropriate distance metric. These changes in the distance metric are crucial for the construction of appropriate biplots. Several new types of biplots viz. quadratic discriminant analysis biplots for use with heteroscedastic stratified data, discriminant subspace biplots and flexible discriminant analysis biplots are derived and their use illustrated. Advantages of the proposed procedures are pointed out. Although biplot methodology is in particular well suited for complementing J > 2 classes discrimination problems its use in 2-class problems is also illustrated.  相似文献   

13.
对于功能性规律陈述的形式特征,可以给出三种方式的逻辑解读及可检验性分析,由此与具有“如果……则”形式的因果性陈述相联系,从而可以得出结论:没有理由把因果性规律与功能性规律的根本区别根植于时间正向的因果性关系和时间逆向的因果性关系的语义区别上,两类规律陈述的区别实际上与条件句中的前件子句与后件子句之间的充分性关系相联系的。  相似文献   

14.
A simple proof of the identification of a mixture of two univariate normal distributions is given. The proof is based on the equivalence of local identification with positive definiteness of the information matrix and the equivalence of the latter to a condition on the score vector that is easily checked for this model. Two extensions using the same line of proof are also given. We would like to thank Tom Wansbeek, Michel Wedel, Arie Kapteyn, and two anonymous reviewers for helpful comments on earlier versions of this paper.  相似文献   

15.
通过质疑基于模型推理的认知论题,尝试对在自然化认识论纲领下的认知-历史分析方法进行规范性研究。类比建模是基于模型(Model-based)推理的主要形式之一,类比建模的基本机制主要包括两个部分:一是对模型来源的泛化抽象,二是基于目标域的特征对模型来源进行限制或修正。这两步反复操作,最终构造出适用于目标对象域的模型。模型与对象域的适切性(fitness)则是对以上机制恰当性的基本评价标准,类型层级理论对相似性和差异性的分析,为测度适切性提供了一条可操作的方法。基于类型层级理论,并结合贝叶斯方法可以解释类比建模何以能够提高模型的可信度,以及类比的创造性与科学合理性之间的关系。这一工作对基于模型推理的科学认知论题的提出了一种可能的反驳。  相似文献   

16.
《缀术》中的“刍甍,方亭之问”初探   总被引:1,自引:0,他引:1  
在分析《缉古算经》的写作目的及其具体内容的基础上,对《缀术》中的“刍甍,方亭之问”和“方邑进行之术”作了探讨,认为:前者是已知刍甍,方亭的体积及其边,高的差,求边和高的问题,因此《缀术》中有三次方程的内容,后者是解勾股形问题,类似于《缉古算经》最后6问,由此又对祖冲之的“开差幂”和“开差立”算法提出质疑,认为它们与“方邑进行之术”和“刍甍,方亭之问”无关。  相似文献   

17.
Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While there are many approaches to pruning, the alternative of averaging over decision trees has not received as much attention. The basic idea of tree averaging is to produce a weighted sum of decisions. We consider the set of trees used for the averaging process, and how weights should be assigned to each tree in this set. We define the concept of afanned set for a tree, and examine how the Minimum Message Length paradigm of learning may be used to average over decision trees. We perform an empirical evaluation of two averaging approaches, and a Minimum Message Length approach.This work has been carried out with the support of the Defence Research Agency, Malvern.  相似文献   

18.
This series of papers is intended to evaluate astrocladistics in reconstructing phylogenies of galaxies. The objective of this second paper is to formalize the concept of galaxy formation and to identify the processes of diversification. We show that galaxy diversity can be expected to organize itself in a hierarchy. In order to better understand the role of mergers, we have selected a sample of 43 galaxies from the GALICS database built from simulations with a hybrid model for galaxy formation studies. These simulated galaxies, described by 119 characters and considered as representing still undefined classes, have experienced different numbers of merger events during evolution. Our cladistic analysis yields a robust tree that proves the existence of a hierarchy. Mergers, like interactions (not taken into account in the GALICS simulations), are probably a strong driver for galaxy diversification. Our result shows that mergers participate in a branching type of evolution, but do not seem to play the role of an evolutionary clock.  相似文献   

19.
In multivariate discrimination of several normal populations, the optimal classification procedure is based on quadratic discriminant functions. We compare expected error rates of the quadratic classification procedure if the covariance matrices are estimated under the following four models: (i) arbitrary covariance matrices, (ii) common principal components, (iii) proportional covariance matrices, and (iv) identical covariance matrices. Using Monte Carlo simulation to estimate expected error rates, we study the performance of the four discrimination procedures for five different parameter setups corresponding to standard situations that have been used in the literature. The procedures are examined for sample sizes ranging from 10 to 60, and for two to four groups. Our results quantify the extent to which a parsimonious method reduces error rates, and demonstrate that choosing a simple method of discrimination is often beneficial even if the underlying model assumptions are wrong.The authors wish to thank the editor and three referees for their helpful comments on the first draft of this article. M. J. Schmid supported by grants no. 2.724-0.85 and 2.038-0.86 of the Swiss National Science Foundation.  相似文献   

20.
军语用字分析能够在一定程度上反映军事领域汉字的构成和应用规律。以2011年版《中国人民解放军军语》(以下简称《军语》)收录的军语为主要对象,通过计量分析,论证了军语用字基本上都为常用汉字,反映了军语大多是按照通俗易懂的原则来定名的。对比1972年版、1982年版、1997年版《军语》词目中的用字情况,发现军语用字的覆盖面正在逐渐扩大,但常用字相对固定,临界字具有位置分布的不均衡性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号