首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical properties of large published classifications   总被引:1,自引:1,他引:0  
Large published classifications typically consist of sets (called taxa) hierarchically arranged according to taxonomic rank. A statistical survey of 23 such classification reveals the following distinctive properties. The pattern of mandatory and optional taxonomic ranks is similar to a Guttman scale. Mean taxon size (defined as the number of next-lower-rank taxa per higher-rank taxon) is a U-shaped function of mandatory rank, and averages about seven across ranks with no significant differences between classifications. The variability of taxon size is a decreasing function of mandatory rank. The generality of these properties across classifications suggests that they are determined by the psychology of the classification process. In contrast, there are significant differences between classifications in the variability of taxon size and in the prevalence of optional ranks, both of which are greater in biological than in nonbiological classifications. These differences may reflect the nature of the materials classified. This research was supported by a research grant from the UCLA Academic Senate and by computer time from the UCLA Office of Academic Computing.  相似文献   

2.
Spectral analysis of phylogenetic data   总被引:12,自引:0,他引:12  
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences, the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum. A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa and perhaps for up to 30 taxa. The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted. The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data are new.  相似文献   

3.
Relative frequency of genera as a function of number of species per genus is plotted for six eighteenth-century classifications: Linnaeus' classifications of animals, plants, minerals, and diseases, and Sauvages' classifications of plants and diseases. The distributions for animals and plants form positively skewed hollow curves similar but not identical to those found in modern biological classifications and predicted by mathematical models of evolution. The distributions for minerals and diseases, however, are more nearly symmetric and convex. The difference between the eighteenth-century and modern classifications of animals and plants probably reflects psychological properties of the taxonomists' judgments; but the difference between the classifications of animals and plants and those of minerals and diseases reflects evolutionary properties of the materials classified, since all six classifications were constructed by the same taxonomists using the same methods. Consequently, the observable effects of evolution are strong enough to be detected in classifications constructed before the acceptance of evolutionary theory; and traditional classifications can contain substantial scientific information despite their reliance on incompletely understood processes of judgment.I thank Mae Ling Hum for assistance in data collection, and Dennis G. Fisher, David M. Raup, Thomas D. Wickens, and J. Arthur Woodward for helpful comments on earlier versions of the paper. Computer time was provided by the UCLA Office of Academic Computing.  相似文献   

4.
When clustering asymmetric proximity data, only the average amounts are often considered by assuming that the asymmetry is due to noise. But when the asymmetry is structural, as typically may happen for exchange flows, migration data or confusion data, this may strongly affect the search for the groups because the directions of the exchanges are ignored and not integrated in the clustering process. The clustering model proposed here relies on the decomposition of the asymmetric dissimilarity matrix into symmetric and skew-symmetric effects both decomposed in within and between cluster effects. The classification structures used here are generally based on two different partitions of the objects fitted to the symmetric and the skew-symmetric part of the data, respectively; the restricted case is also presented where the partition fits jointly both of them allowing for clusters of objects similar with respect to the average amounts and directions of the data. Parsimonious models are presented which allow for effective and simple graphical representations of the results.  相似文献   

5.
The distribution of lengths of phylogenetic trees under the taxonomic principle of parsimony is compared with the distribution obtained by randomizing the characters of the sequence data. This comparison allows us to define a measure of the extent to which sequence data contain significant hierarchical information. We show how to calculate this measure exactly for up to 10 taxa, and provide a good approximation for larger sets of taxa. The measure is applied to test sequences on 10 and 15 taxa.  相似文献   

6.
Interpreting a taxonomic tree as a set of objects leads to natural measures of complexity and similarity, and sets natural lower bounds on a consensus tree Interpretations differing as to the kind of objects constituting a tree lead to different measures and consensus Subset nesting is preferred over the clusters (strict consensus) and even the triads interpretations because of its superior expression of shared structure Algorithms for computing the complexity and similarity of trees, as well as a consensus index onto [0,1], are presented for this interpretation The full consensus is defined as the only tree which includes all the nestings shared in a profile of rival trees and whose clusters reflect only nestings shared in the profile The full consensus is proved to exist uniquely for each profile, and to equal the Adams consensusThe author is grateful for the many helpful comments on presentation from Frances McA Adams, William H E Day, and Christopher A Meacham  相似文献   

7.
模糊性是自然语言的一大本质特性,即使是需要精确意义的经济学术语,模糊表达依然存在。虽然人类的思维共性可以使操不同语言的人们理解诸多具有模糊性的概念,但由于思维方式和语言习惯的不同,在英汉转换中术语的模糊性可能会影响正确的理解。本文认为无论是以模糊译模糊,或变模糊为清晰,只要能使译文更符合汉语术语的规范,为目的语读者可接受,便是可行的翻译策略。  相似文献   

8.
We present a new distance based quartet method for phylogenetic tree reconstruction, called Minimum Tree Cost Quartet Puzzling. Starting from a distance matrix computed from natural data, the algorithm incrementally constructs a tree by adding one taxon at a time to the intermediary tree using a cost function based on the relaxed 4-point condition for weighting quartets. Different input orders of taxa lead to trees having distinct topologies which can be evaluated using a maximum likelihood or weighted least squares optimality criterion. Using reduced sets of quartets and a simple heuristic tree search strategy we obtain an overall complexity of O(n 5 log2 n) for the algorithm. We evaluate the performances of the method through comparative tests and show that our method outperforms NJ when a weighted least squares optimality criterion is employed. We also discuss the theoretical boundaries of the algorithm.  相似文献   

9.
牡丹在中国传统名花中独领群芳,在中华民族的心目中占有很高的地位。在中国悠久的花文化历史长河中产生了众多的牡丹谱录。本文在总结前人对中国古代牡丹谱录研究状况的基础上,通过查阅古籍文献进一步考证了中国古代牡丹谱录的数量和存世状况。结果表明,中国历史上有记载的牡丹谱录共计41部,现尚存世16部,按内容可以将其分为品种谱和综合谱两大类。中国古代牡丹谱录的体例历代各不相同,主要包含序、正文、附记和跋四个部分。现尚存于世的16部古代牡丹谱录的内容和形式丰富多样,对中国古代牡丹文化的研究以及现代牡丹的育种与栽培技术研究具有重要参考价值。  相似文献   

10.
Rhetorical strategy is relevant in the law domain, where language is a vital instrument. Textual statistics have much to offer for uncovering such a strategy. We propose a methodology that starts from a non-structured text; first, the breakpoints are automatically detected and lexically homogeneous parts are identified; then, the shape of the text through the trajectory of these parts and their hierarchical structure are uncovered; finally, the argument flow is tracked along. Several methods are combined. Chronological clustering of multidimensional count series detects the breakpoints; the shape of the text is revealed by applying correspondence analysis to the parts×words table while the progression of the argument is described by labelled time-constrained hierarchical clustering. This methodology is illustrated on a rhetoric forensic application, concretely a closing speech delivered by a prosecutor at Barcelona Criminal Court. This approach could also be useful in politics, communication and professional writing.  相似文献   

11.
有关自然主义的几个问题的辨析   总被引:3,自引:0,他引:3  
文章详细分析了自然主义的起源、概念与内涵,并分析了自然主义与唯物主义的区别与联系.指出当前自然主义的复兴与科学之间存在的密切关系,并对当前国外各种自然主义研究中存在的缺失与局限进行了详细的考察.  相似文献   

12.
"实质蕴涵怪论"问题是困扰经典逻辑的语义问题,在解决它的过程中产生了严格蕴涵和相干蕴涵,出现了模态逻辑和相干逻辑。模态逻辑中仍然有"严格蕴涵怪论"。相干逻辑避免了"蕴涵怪论",但把一些有效的推理形式排除在外,还具有不可判定性。"蕴涵怪论"是由于对推理关系进行形式化(数学化)引起的,因此,我们可以抛开形式系统来寻找一种方法避免"怪论"。本文将给出一个消除"蕴涵怪论"的可行方法———欧拉图解方法。  相似文献   

13.
Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While there are many approaches to pruning, the alternative of averaging over decision trees has not received as much attention. The basic idea of tree averaging is to produce a weighted sum of decisions. We consider the set of trees used for the averaging process, and how weights should be assigned to each tree in this set. We define the concept of afanned set for a tree, and examine how the Minimum Message Length paradigm of learning may be used to average over decision trees. We perform an empirical evaluation of two averaging approaches, and a Minimum Message Length approach.This work has been carried out with the support of the Defence Research Agency, Malvern.  相似文献   

14.
《夏小正》星象年代研究   总被引:1,自引:0,他引:1  
《夏小正》是中国最早的一部星象物修历,也是流传至今的最早的一部完整的天文学文献。文章将该历按月给出的17个星象分为昏旦南中,伏见和星座指出3个部分其使用年,结论是:《夏小正》中各星象的年代是一致的,该历普被用于周代,其起源可以上推至夏代,但确认后者还需要其它方面的证据。  相似文献   

15.
The Academic Journal Ranking Problem consists in formulating a formal assessment of scientific journals. An outcome variable must be constructed that allows valid journal comparison, either as a set of tiers (ordered classes) or as a numerical index. But part of the problem is also to devise a procedure to get this outcome, that is, how to get and use relevant data coming from expert opinions or from citations database. We propose a novel approach to the problem that applies fuzzy cluster analysis to peer reviews and opinion surveys. The procedure is composed of two steps: the first is to collect the most relevant qualitative assessments from international organizations (for example, the ones available in the Harzing database) and, as inductive analysis, to apply fuzzy clustering to determine homogeneous journal classes; the second deductive step is to determine the hidden logical rules that underlies the classification, using a classification tree to reproduce the same patterns of the first step.  相似文献   

16.
以Coxhead的学术词表为例讨论了基于语料库的词表创建应遵循的原则和方法,主要包括明确词表创建的目的或目标,选择或自建合适的语料库,确定词频统计单位,制定词汇选取标准以及对词表进行评估与测试五个方面,并且指出现有词表的维护与升级以及专用词表的研制与开发将是未来研究的方向和重点。  相似文献   

17.
Displacing Epistemology: Being in the Midst of Technoscientific Practice   总被引:2,自引:2,他引:0  
Interest the Erklären?CVerstehen debate is usually interpreted as primarily epistemological. By raising the possibility that there are fundamentally different methods for fundamentally different types of science, the debate puts into play all the standard issues??that is, issues concerning scientific explanation and justification, the unity and diversity of scientific disciplines, the reality of their subject matter, the accessibility of various subject matters to research, and so on. In this paper, however, I do not focus on any of these specific issues. I start instead from the fact that the very existence of the debate itself is an issue; in fact, it poses a philosophical problem that almost everyone but the hardest line logical empiricists has come to realize cannot be resolved epistemologically. In my view, however, that it cannot be resolved ontologically, either. I think the problem is at bottom hermeneutical, and its resolution requires that we focus first, not on the objects of science or the methods of studying them, but on the character of the philosophical orientation assumed by those who would try to resolve it. In this paper, I explain why I think this is so by analyzing (1) Dilthey??s contribution to the original debate, (2) Husserl??s reaction to Dilthey, and (3) Heidegger??s critical evaluation of both. This line of philosophical development??this movement of self-understanding from critiques of objectivism to hermeneutical phenomenology??is of course already a central feature of much work in continental philosophy of science. In my conclusion, however, I argue for the less well-established??even if apparently approved??idea that it ought to be a central feature of technoscience studies as well.  相似文献   

18.
Cognitive diagnostic models provide valuable information on whether a student has mastered each of the attributes a test intends to evaluate. Despite its generality, the generalized DINA model allows for the possibility of lower correct rates for students who master more attributes than those who know less. This paper considers the use of order-constrained parameter space of the G-DINA model to avoid such a counter-intuitive phenomenon and proposes two algorithms, the upward and downward methods, for parameter estimation. Through simulation studies, we compare the accuracy in parameter estimation and in classification of attribute patterns obtained from the proposed two algorithms and the current approach when the restricted parameter space is true. Our results show that the upward method performs the best among the three, and therefore it is recommended for estimation, regardless of the distribution of respondents’ attribute patterns, types of test items, and the sample size of the data.  相似文献   

19.
A Thurstonian model for ranks is introduced in which rank-induced dependencies are specified through correlation coefficients among ranked objects that are determined by a vector of rank-induced parameters. The ranking model can be expressed in terms of univariate normal distribution functions, thus simplifying a previously computationally intensive problem. A theorem is proven that shows that the specification given in the paper for the dependencies is the only way that this simplification can be achieved under the process assumptions of the model. The model depends on certain conditional probabilities that arise from item orders considered by subjects as they make ranking decisions. Examples involving a complete set of ranks and a set with missing values are used to illustrate recovery of the objects’ scale values and the rank dependency parameters. Application of the model to ranks for gift items presented singly or as composite items is also discussed.  相似文献   

20.
Dendrograms based onn objects can contain as many asn – 1 levels (internal nodes) and prove difficult to interpret. Two methods are described for transforming a dendrogram into a more readily interpretable parsimonious tree. These involve limiting either (i) the number of different values taken by the heights of the internal nodes, or (ii) the number of internal nodes. An illustrative example is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号