共查询到20条相似文献,搜索用时 31 毫秒
1.
Single linkage clusters on a set of points are the maximal connected sets in a graph constructed by connecting all points
closer than a given threshold distance. The complete set of single linkage clusters is obtained from all the graphs constructed
using different threshold distances. The set of clusters forms a hierarchical tree, in which each non-singleton cluster divides
into two or more subclusters; the runt size for each single linkage cluster is the number of points in its smallest subcluster.
The maximum runt size over all single linkage clusters is our proposed test statistic for assessing multimodality. We give
significance levels of the test for two null hypotheses, and consider its power against some bimodal alternatives.
Research partially supported by NSF Grant No. DMS-8617919. 相似文献
2.
George W. Furnas 《Journal of Classification》1984,1(1):187-233
Several techniques are given for the uniform generation of trees for use in Monte Carlo studies of clustering and tree representations. First, general strategies are reviewed for random selection from a set of combinatorial objects with special emphasis on two that use random mapping operations. Theorems are given on how the number of such objects in the set (e.g., whether the number is prime) affects which strategies can be used. Based on these results, methods are presented for the random generation of six types of binary unordered trees. Three types of labeling and both rooted and unrooted forms are considered. Presentation of each method includes the theory of the method, the generation algorithm, an analysis of its computational complexity and comments on the distribution of trees over which it samples. Formal proofs and detailed algorithms are in appendices. 相似文献
3.
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression,
we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great
success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP
output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function.
We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling
from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables,
which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this
task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification
problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets.
We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference,
produces rules that are easy to interpret, and performs well out of sample. 相似文献
4.
Michael Steel 《Journal of Classification》1992,9(1):91-116
In taxonomy and other branches of classification it is useful to know when tree-like classifications on overlapping sets of
labels can be consistently combined into a parent tree. This paper considers the computation complexity of this problem. Recognizing
when a consistent parent tree exists is shown to be intractable (NP-complete) for sets of unrooted trees, even when each tree
in the set classifies just four labels. Consequently determining the compatibility of qualitative characters and partial binary
characters is, in general, also NP-complete. However for sets of rooted trees an algorithm is described which constructs the
“strict consensus tree” of all consistent parent trees (when they exist) in polynomial time. The related question of recognizing
when a set of subtrees uniquely defines a parent tree is also considered, and a simple necessary and sufficient condition
is described for rooted trees.
This work was supproted by the Alexander von Humoldt-Stiftung. I wish to thank Andreas Dress, Hans-Jürgen Bandelt and the
referees for their helpful comments. 相似文献
5.
拉姆齐测验描述了人们决定接受p后,是不是接受q,从而对他的其它信念作最小修正的问题。20世纪60年代后,学界对这种思想进行了不同的解释,从总体上看,主要有两种:简单的拉姆齐测验和精致的拉姆齐测验。在分析这两种观点的基础上,我们认为这两种解释都存在一定的问题,在本文中,我们提出了一个基于拉姆齐测验的新的逻辑解释思路。 相似文献
6.
Patrick Erik Bradley 《Journal of Classification》2008,25(1):27-42
Dendrograms used in data analysis are ultrametric spaces, hence objects of nonarchimedean geometry. It is known that there exist p-adic representations of dendrograms. Completed by a point at infinity, they can be viewed as subtrees of the Bruhat-Tits tree associated to the p-adic projective line. The implications are that certain moduli spaces known in algebraic geometry are in fact p-adic parameter spaces of dendrograms, and stochastic classification can also be handled within this framework. At the end, we calculate the topology of the hidden part of a dendrogram. 相似文献
7.
Spectral analysis of phylogenetic data 总被引:12,自引:0,他引:12
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences,
the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which
counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation
called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for
unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge
weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic
tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality
selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches
the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard
conjugation to allow a comparison with the original sequence spectrum.
A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding
spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance
and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa
and perhaps for up to 30 taxa.
The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted.
The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data
are new. 相似文献
8.
有一些汉字,例如醣、朊和甾,在学术上有特定的含义,简易明了,曾经使用,目前被禁用了,但在《新华字典》上仍保留。建议恢复使用这些学术术语中特有的汉字。推而广之,一些特有的汉字在学术术语中应有一席之地。 相似文献
9.
英语缩合词是由两个或多个名词的各自一部分相互组合形成的词;结构形式可以是字母或是语素。其中,首字母缩合词被广泛使用。中文没有缩合词,而有缩略词,中文缩略词是由词组词省略部分汉字的压缩形式,是来源词的同名异表形式。中文缩略词较英语缩合词有许多优点。英语缩合词有一定局限性,在中文著作和媒体报道中应尽量不用或少用英语缩合词和缩略词,宜将其翻译成适当的中文缩略词。 相似文献
10.
Bruno Leclerc 《Journal of Classification》1995,12(2):207-241
Two properties of tree metrics are already known in the literature: tree metrics on a setX withn elements have 2n?3 degrees of freedom; a tree metric has Robinson form with regard to its minimum spanning tree (MST), or to any such MST if several of them exist. Starting from these results, we prove that a tree metrict is entirely defined by its restriction to some setB of 2n?3 entries. This set is easily determined from the table oft and includes then?1 entries of an MST. A fast method for the adjustment of a tree metric to any given metricd is then obtained. This method extends to dissimilarities. 相似文献
11.
Classifications are generally pictured in the form of hierarchical trees, also called dendrograms. A dendrogram is the graphical
representation of an ultrametric (=cophenetic) matrix; so dendrograms can be compared to one another by comparing their cophenetic
matrices. Three methods used in testing the correlation between matrices corresponding to dendrograms are evaluated. The three
permutational procedures make use of different aspects of the information to compare dendrograms: the Mantel procedure permutes
label positions only; the binary tree methods randomize the topology as well; the double-permutation procedure is based on
all the information included in a dendrogram, that is: topology, label positions, and cluster heights. Theoretical and empirical
investigations of these methods are carried out to evaluate their relative performance. Simulations show that the Mantel test
is too conservative when applied to the comparison of dendrograms; the methods of binary tree comparisons do slightly better;
only the doublepermutation test provides unbiased type I error.
Les arbres utilisés pour illustrés les groupements sont généralement représentés sous la forme de classifications hiérarchiques
ou dendrogrammes. Un dendrogramme représente graphiquement l’information contenue dans la matrice ultramétrique (=cophénétique)
correspondant à la classification. Dès ultramétriques correspondantes. Nous comparons trois méthodes permettant d’évaluer
la signification statistique du coefficient de correlation mesuré entre deux matrices ultramétriques. Ces trois tests par
permutations tiennent compte d’aspects différents pour comparer des dendrogrammes: le test de Mantel permute les feuilles
de l’arbre, les méthodes pour arbres binaires permutent les feuilles et la topologie, alors que la procédure à double permutation
permute les feuilles, la topologie et les niveaux de fusion des dendrogrammes comparés. L’efficacité relative des trois méthodes
est évaluée empiriquement et théoriquement. Nos résultats suggèrent l’utilisation préférentielle du test à double permutation
pour la comparaison de dendrogrammes: le test de Mantel s’avère trop conservateur, tandis que les méthodes pour arbres binaires
ne sont pas toujours adéquates.
This work was supported by NSERC grant no. A7738 to Pierre Legendre and by a NSERC scholarship to F.-J. Lapointe. 相似文献
This work was supported by NSERC grant no. A7738 to Pierre Legendre and by a NSERC scholarship to F.-J. Lapointe. 相似文献
12.
In this paper we show how biplot methodology can be combined with
various forms of discriminant analyses leading to highly informative visual displays of
the respective class separations. It is demonstrated that the concept of distance as
applied to discriminant analysis provides a unified approach to a wide variety of
discriminant analysis procedures that can be accommodated by just changing to an
appropriate distance metric. These changes in the distance metric are crucial for the
construction of appropriate biplots. Several new types of biplots viz. quadratic
discriminant analysis biplots for use with heteroscedastic stratified data, discriminant
subspace biplots and flexible discriminant analysis biplots are derived and their use
illustrated. Advantages of the proposed procedures are pointed out. Although biplot
methodology is in particular well suited for complementing J > 2 classes discrimination
problems its use in 2-class problems is also illustrated. 相似文献
13.
对于功能性规律陈述的形式特征,可以给出三种方式的逻辑解读及可检验性分析,由此与具有“如果……则”形式的因果性陈述相联系,从而可以得出结论:没有理由把因果性规律与功能性规律的根本区别根植于时间正向的因果性关系和时间逆向的因果性关系的语义区别上,两类规律陈述的区别实际上与条件句中的前件子句与后件子句之间的充分性关系相联系的。 相似文献
14.
A simple proof of the identification of a mixture of two univariate normal distributions is given. The proof is based on the
equivalence of local identification with positive definiteness of the information matrix and the equivalence of the latter
to a condition on the score vector that is easily checked for this model. Two extensions using the same line of proof are
also given.
We would like to thank Tom Wansbeek, Michel Wedel, Arie Kapteyn, and two anonymous reviewers for helpful comments on earlier
versions of this paper. 相似文献
15.
通过质疑基于模型推理的认知论题,尝试对在自然化认识论纲领下的认知-历史分析方法进行规范性研究。类比建模是基于模型(Model-based)推理的主要形式之一,类比建模的基本机制主要包括两个部分:一是对模型来源的泛化抽象,二是基于目标域的特征对模型来源进行限制或修正。这两步反复操作,最终构造出适用于目标对象域的模型。模型与对象域的适切性(fitness)则是对以上机制恰当性的基本评价标准,类型层级理论对相似性和差异性的分析,为测度适切性提供了一条可操作的方法。基于类型层级理论,并结合贝叶斯方法可以解释类比建模何以能够提高模型的可信度,以及类比的创造性与科学合理性之间的关系。这一工作对基于模型推理的科学认知论题的提出了一种可能的反驳。 相似文献
16.
《缀术》中的“刍甍,方亭之问”初探 总被引:1,自引:0,他引:1
在分析《缉古算经》的写作目的及其具体内容的基础上,对《缀术》中的“刍甍,方亭之问”和“方邑进行之术”作了探讨,认为:前者是已知刍甍,方亭的体积及其边,高的差,求边和高的问题,因此《缀术》中有三次方程的内容,后者是解勾股形问题,类似于《缉古算经》最后6问,由此又对祖冲之的“开差幂”和“开差立”算法提出质疑,认为它们与“方邑进行之术”和“刍甍,方亭之问”无关。 相似文献
17.
Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While there are many approaches to pruning, the alternative of averaging over decision trees has not received as much attention. The basic idea of tree averaging is to produce a weighted sum of decisions. We consider the set of trees used for the averaging process, and how weights should be assigned to each tree in this set. We define the concept of afanned set for a tree, and examine how the Minimum Message Length paradigm of learning may be used to average over decision trees. We perform an empirical evaluation of two averaging approaches, and a Minimum Message Length approach.This work has been carried out with the support of the Defence Research Agency, Malvern. 相似文献
18.
Didier Fraix-Burnet Philippe Choler Emmanuel J.P. Douzery Anne Verhamme 《Journal of Classification》2006,23(1):57-78
This series of papers is intended to evaluate astrocladistics in reconstructing phylogenies of galaxies. The objective of
this second paper is to formalize the concept of galaxy formation and to identify the processes of diversification. We show
that galaxy diversity can be expected to organize itself in a hierarchy. In order to better understand the role of mergers,
we have selected a sample of 43 galaxies from the GALICS database built from simulations with a hybrid model for galaxy formation
studies. These simulated galaxies, described by 119 characters and considered as representing still undefined classes, have
experienced different numbers of merger events during evolution. Our cladistic analysis yields a robust tree that proves the
existence of a hierarchy. Mergers, like interactions (not taken into account in the GALICS simulations), are probably a strong
driver for galaxy diversification. Our result shows that mergers participate in a branching type of evolution, but do not
seem to play the role of an evolutionary clock. 相似文献
19.
In multivariate discrimination of several normal populations, the optimal classification procedure is based on quadratic discriminant functions. We compare expected error rates of the quadratic classification procedure if the covariance matrices are estimated under the following four models: (i) arbitrary covariance matrices, (ii) common principal components, (iii) proportional covariance matrices, and (iv) identical covariance matrices. Using Monte Carlo simulation to estimate expected error rates, we study the performance of the four discrimination procedures for five different parameter setups corresponding to standard situations that have been used in the literature. The procedures are examined for sample sizes ranging from 10 to 60, and for two to four groups. Our results quantify the extent to which a parsimonious method reduces error rates, and demonstrate that choosing a simple method of discrimination is often beneficial even if the underlying model assumptions are wrong.The authors wish to thank the editor and three referees for their helpful comments on the first draft of this article. M. J. Schmid supported by grants no. 2.724-0.85 and 2.038-0.86 of the Swiss National Science Foundation. 相似文献