首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 674 毫秒
1.
Directed binary hierarchies have been introduced in order to give a graphical reduced representation of a family of association rules. This type of structure extends the classical binary hierarchical classification in a very specific way. In this paper an accurate formalization of this new structure is studied. A directed hierarchy is defined as a set of ordered pairs of subsets of the initial individual set satisfying specific conditions. A new notion of directed ultrametricity is studied. The main result consists in establishing a bijective correspondence between a directed ultrametric space and a directed binary hierarchy. Finally, an algorithm is proposed in order to transform a directed ultrametric structure into a graphical representation associated with a directed binary hierarchy.  相似文献   

2.
In this paper we will offer a few examples to illustrate the orientation of contemporary research in data analysis and we will investigate the corresponding role of mathematics. We argue that the modus operandi of data analysis is implicitly based on the belief that if we have collected enough and sufficiently diverse data, we will be able to answer most relevant questions concerning the phenomenon itself. This is a methodological paradigm strongly related, but not limited to, biology, and we label it the microarray paradigm. In this new framework, mathematics provides powerful techniques and general ideas which generate new computational tools. But it is missing any explicit isomorphism between a mathematical structure and the phenomenon under consideration. This methodology used in data analysis suggests the possibility of forecasting and analyzing without a structured and general understanding. This is the perspective we propose to call agnostic science, and we argue that, rather than diminishing or flattening the role of mathematics in science, the lack of isomorphisms with phenomena liberates mathematics, paradoxically making more likely the practical use of some of its most sophisticated ideas.  相似文献   

3.
Spectral analysis of phylogenetic data   总被引:12,自引:0,他引:12  
The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences, the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum. We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum. A possible adaptation for the analysis of four-state character sequences with unequal frequencies is considered. A corresponding spectral analysis for distance data is also introduced. These analyses are illustrated with biological examples for both distance and sequence data. Spectral analysis using the Fast Hadamard transform allows optimal trees to be found for at least 20 taxa and perhaps for up to 30 taxa. The development presented here is self contained, although some mathematical proofs available elsewhere have been omitted. The analysis of sequence data is based on methods reported earlier, but the terminology and the application to distance data are new.  相似文献   

4.
俄语是联合国工作语言之一,是俄罗斯等多个国家的官方语言。随着“一带一路”倡议的推进和全球化进程的加快,俄语文本数据成为有关组织管理决策的重要信息来源,俄语文本挖掘也因而成为重要的管理决策支持方法。然而,俄语文本挖掘方法研究目前还远未成熟,尤其是其关键基础——俄语文本词语提取的性能较低,阻碍着俄语文本建模的准确性。因此,文章提出一种多策略融合的俄语文本词语提取方法,结合俄语词性分析、语法规则和串频统计等多种方法,自动提取包含单词和短语在内的俄语词语。在联合国平行语料库和Taiga Corpus语料库上的实验结果表明,文章提出的方法在保证高召回率的同时,达到了85%以上的高准确率,显著优于常用的n-gram方法,能够为俄语文本主题发现和文本分/聚类等文本挖掘应用提供有效的词库。  相似文献   

5.
This paper develops a new procedure for simultaneously performing multidimensional scaling and cluster analysis on two-way compositional data of proportions. The objective of the proposed procedure is to delineate patterns of variability in compositions across subjects by simultaneously clustering subjects into latent classes or groups and estimating a joint space of stimulus coordinates and class-specific vectors in a multidimensional space. We use a conditional mixture, maximum likelihood framework with an E-M algorithm for parameter estimation. The proposed procedure is illustrated using a compositional data set reflecting proportions of viewing time across television networks for an area sample of households.  相似文献   

6.
在语言学中,“文字学”这个术语常有两种含义:一种是指“以不同民族文字为对象,揭示人类文字构成和运用的一般规律的科学”;另一种是指“研究汉字的学问”。目前,无论是从术语规范的角度出发,还是从人们现今的实际使用状况来看,后一种用法已经很不合适了,应予以剔除。另外,“研究汉字的学问”目前还有着多种异称:“文字学”、“汉字学”、“中国文字学”等,为避免术语使用的混乱,应确定学术界普遍认可的“汉字学”作为规范的正名。  相似文献   

7.
拉卡托斯的"研究纲领"和经济学方法论   总被引:6,自引:0,他引:6  
拉卡托斯对波普尔思想和库恩思想的“有效”综合使其成为对经济学家有吸引力的人物。他的“科学研究纲领方法论”在经济学中得到最多关注的两个方法论特征是特定研究纲领的结构和评价。把拉卡托斯的方法论作为评价各种经济学研究纲领的工具,面临许多分歧和国境,但它确实有助于理解经济学或特定经济理论的结构。  相似文献   

8.
The Academic Journal Ranking Problem consists in formulating a formal assessment of scientific journals. An outcome variable must be constructed that allows valid journal comparison, either as a set of tiers (ordered classes) or as a numerical index. But part of the problem is also to devise a procedure to get this outcome, that is, how to get and use relevant data coming from expert opinions or from citations database. We propose a novel approach to the problem that applies fuzzy cluster analysis to peer reviews and opinion surveys. The procedure is composed of two steps: the first is to collect the most relevant qualitative assessments from international organizations (for example, the ones available in the Harzing database) and, as inductive analysis, to apply fuzzy clustering to determine homogeneous journal classes; the second deductive step is to determine the hidden logical rules that underlies the classification, using a classification tree to reproduce the same patterns of the first step.  相似文献   

9.
文章以土耳其语军事领域术语语言特征研究为基础,提出一种规则与统计相结合的术语抽取方法,先后通过关键词、停止词、形态分析序列模式、点互信息、左右信息熵和临接词缀等特征对单语文本中的候选项进行筛选,在W-data和N-data大小两组单语文本中进行实验,结果表明该方法能够有效地从实验数据中抽取土耳其语军事术语。  相似文献   

10.
In this article we seek to lay bare a couple of potential conceptual and methodological issues that, we believe, are implicitly present in contemporary philosophy of technology (PhilTech). At stake are (1) the sustained pertinence of and need for coping strategies as to ‘how to live with technology (in everyday life)’ notwithstanding PhilTech’s advancement in its non-essentialist analysis of ‘technology’ as such; (2) the issue of whether ‘living with technology’ is a technological affair or not (or both); and (3) the tightly related question concerning the status of the methodological bedrock of contemporary PhilTech, the ‘empirical turn.’ These matters are approached from the perspective of the philosophical notion of the ‘art of living,’ and our argumentation is developed both as a context for and on the basis of the contributions to the special issue ‘The Art of Living with Technology.’  相似文献   

11.
根据基本定义和专门的命名规则,建议统一carborane 的中文定名为“碳硼烷”,C2B10H12 和C2B5H7分别命名为“二碳代-闭式-十二硼烷”(其异构体为“邻-、间-、对碳硼烷”)和“二碳代-闭式-庚硼烷”。  相似文献   

12.
近些年,地浸采铀技术的进步极大刺激了国内地浸采铀矿山建设的蓬勃发展,新技术、新工艺不断涌现。但是,在地浸采铀相关术语的理解和使用中存在各种问题。在分析科技文献和地浸采铀实践的基础上,经过多年的推敲,作者认为:资源利用率、金属浸出率、液固比、平米铀量等作为地浸采铀科技术语存在定义不严谨、运用错误等不科学现象。  相似文献   

13.
语言学界在译介ethnopragmatics这一术语时,理解和翻译存在一定分歧,根据其理论基础和方法论,主张将其翻译为民族志语用学。  相似文献   

14.
We propose and discuss improved Bayes rules to discriminate between two populations using ordered predictors. To address the problem we propose an alternative formulation using a latent space that allows to introduce the information about the order in the theoretical rules. The rules are first defined when the marginal densities are fully known and then under normality when the parameters are unknown and training samples are available. Several numerical examples and simulations in the paper illustrate the methodology and show that the new rules handle the information appropriately. We compare the new rules with the classical Bayes and Fisher rules in these examples and we show that the misclassification probability is smaller for the new rules. The method is also applied to data from a diabetes study where we again show that the new rules improve over the usual Fisher rule. Research partially supported by Spanish DGES and by PAPIJCL. The authors thank the editor and an anonymous reviewer for their detailed reading that resulted in this much improved version of the paper.  相似文献   

15.
量子场论的还原性问题   总被引:1,自引:0,他引:1  
描述粒子物理学的核心语言是可重整化的量子场论。重整化最初是应对量子场论中数学计算无穷大问题的有效策略,是解决理论物理学中“突现”问题的有力手段。本文从重整化操作在量子场论中产生的影响出发,回顾了有效场论思想产生的历史,叙述了重整化方法所引起的哲学争论,旨在对还原论和反还原论在解释量子场论中涉及的“还原”、“突现”、“层次性”等问题予以方法论考察,最后从语境的视角对重整化方法进行诠释,指出对基础理论及理论间关系的探讨是具有境遇性的。  相似文献   

16.
Traditionally latent class (LC) analysis is used by applied researchers as a tool for identifying substantively meaningful clusters. More recently, LC models have also been used as a density estimation tool for categorical variables. We introduce a divisive LC (DLC) model as a density estimation tool that may offer several advantages in comparison to a standard LC model. When using an LC model for density estimation, a considerable number of increasingly large LC models may have to be estimated before sufficient model-fit is achieved. A DLC model consists of a sequence of small LC models. Therefore, a DLC model can be estimated much faster and can easily utilize multiple processor cores, meaning that this model is more widely applicable and practical. In this study we describe the algorithm of fitting a DLC model, and discuss the various settings that indirectly influence the precision of a DLC model as a density estimation tool. These settings are illustrated using a synthetic data example, and the best performing algorithm is applied to a real-data example. The generated data example showed that, using specific decision rules, a DLC model is able to correctly model complex associations amongst categorical variables.  相似文献   

17.
Probabilistic feature models (PFMs) can be used to explain binary rater judgements about the associations between two types of elements (e.g., objects and attributes) on the basis of binary latent features. In particular, to explain observed object-attribute associations PFMs assume that respondents classify both objects and attributes with respect to a, usually small, number of binary latent features, and that the observed object-attribute association is derived as a specific mapping of these classifications. Standard PFMs assume that the object-attribute association probability is the same according to all respondents, and that all observations are statistically independent. As both assumptions may be unrealistic, a multilevel latent class extension of PFMs is proposed which allows objects and/or attribute parameters to be different across latent rater classes, and which allows to model dependencies between associations with a common object (attribute) by assuming that the link between features and objects (attributes) is fixed across judgements. Formal relationships with existing multilevel latent class models for binary three-way data are described. As an illustration, the models are used to study rater differences in product perception and to investigate individual differences in the situational determinants of anger-related behavior.  相似文献   

18.
In this paper, we present empirical and theoretical results on classification trees for randomized response data. We considered a dichotomous sensitive response variable with the true status intentionally misclassified by the respondents using rules prescribed by a randomized response method. We assumed that classification trees are grown using the Pearson chi-square test as a splitting criterion, and that the randomized response data are analyzed using classification trees as if they were not perturbed. We proved that classification trees analyzing observed randomized response data and estimated true data have a one-to-one correspondence in terms of ranking the splitting variables. This is illustrated using two real data sets.  相似文献   

19.
技术发展模式中的符号和建制问题研究   总被引:1,自引:0,他引:1  
结构是存在的特征,符号是人的生存能力形成的基本要素,科学技术产业是人类长期发展起来的生产能力系统。科学技术产业从本质上讲是一种结构性、符号性事业,其生产性功能的实现,需要在结构和符号的认识使用中,建构一种以符号为链的秩序化、组织化、制度化的哑铃模型,其中技术是哑铃模型的关键,它自己需要不断的建制化。  相似文献   

20.
We consider processes of emergence within the conceptual framework of the Information Loss principle and the concepts of (1) systems conserving information; (2) systems compressing information; and (3) systems amplifying information. We deal with the supposed incompatibility between emergence and computability tout-court. We distinguish between computational emergence, when computation acquires properties, and emergent computation, when computation emerges as a property. The focus is on emergence processes occurring within computational processes. Violations of Turing-computability such as non-explicitness and incompleteness are intended to represent partially the properties of phenomenological emergence, such as logical openness, given by the observer’s cognitive role; structural dynamics where change regards rules rather than only values; and multi-modelling where multiple non-equivalent models are required to model such structural dynamics. In this way, we validate, from an epistemological viewpoint, models and simulations of phenomenological emergence where the sequence of events constitutes the natural, analogical non-Turing computation which a cognitive complex system can reproduce through learning. Reproducibility through learning is different from Turing-like computational iteration. This paper aims to open a new, non-reductionist understanding of the conceptual relationship between emergence and computability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号