This paper studies the problem of estimating the number of clusters in the context of logistic regression clustering. The classification likelihood approach is employed to tackle this problem. A model-selection based criterion for selecting the number of logistic curves is proposed and its asymptotic property is also considered. The small sample performance of the proposed criterion is studied by Monto Carlo simulation. In addition, a real data example is presented. The authors would like to thank the editor, Prof. Willem J. Heiser, and the anonymous referees for the valuable comments and suggestions, which have led to the improvement of this paper.  相似文献   

The primary method for validating cluster analysis techniques is throughMonte Carlo simulations that rely on generating data with known cluster structure (e.g., Milligan 1996). This paper defines two kinds of data generation mechanisms with cluster overlap, marginal and joint; current cluster generation methods are framed within these definitions. An algorithm generating overlapping clusters based on shared densities from several different multivariate distributions is proposed and shown to lead to an easily understandable notion of cluster overlap. Besides outlining the advantages of generating clusters within this framework, a discussion is given of how the proposed data generation technique can be used to augment research into current classification techniques such as finite mixture modeling, classification algorithm robustness, and latent profile analysis.  相似文献   

The issue of determining “the right number of clusters” in K-Means has attracted considerable interest, especially in the recent years. Cluster intermix appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between- and within-cluster spread to model cluster intermix. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the “intelligent” K-Means method, ik-Means, that find the “right” number of clusters by extracting “anomalous patterns” from the data one-by-one. We compare them with seven other methods, including Hartigan’s rule, averaged Silhouette width and Gap statistic, under different between- and within-cluster spread-shape conditions. There are several consistent patterns in the results of our experiments, such as that the right K is reproduced best by Hartigan’s rule – but not clusters or their centroids. This leads us to propose an adjusted version of iK-Means, which performs well in the current experiment setting.  相似文献   

Separability of clusters is an issue that arises in many different areas, and is often used in a rather vague and subjective manner. We introduce a combinatorial notion of interiority to derive a global view on separability of a set of entities. We develop this approach further to evaluate the overall separability of a partition in the context of cluster analysis. Our approach captures combinatorial and geometrical aspects of data and provides, in addition to numerical evaluations, graphical representations particularly useful when data are not easily visualized. We illustrate the methodology on some real and simulated datasets.  相似文献   

In this paper an algorithm is developed, which aims to find all FPCs of a dataset corresponding to well separated linear regression subpopulations. Its ability to find such subpopulations under the occurence of outliers is compared to methods based on ML-estimation of mixture models by means of a simulation study. Furthermore, FPC analysis is applied to a real dataset.  相似文献   

大规模成矿作用与大型矿集区预测研究   总被引:3,自引:0,他引:3  
《大规模成矿作用与大型矿集区预测》是国家重点基础研究发展计划 (973计划 )实施以来第一个以固体矿产资源为目标的研究项目。通过 5年 (1 999年 1 0月— 2 0 0 4年 9月 )研究 ,在多项基础地质和矿产资源成矿理论研究方面取得了重要进展 :初步提出了中国中新生代大陆成矿理论体系 ,为预测大矿和大型矿集区奠定了理论基础 ;研制和发展了 4项找矿新技术方法 ,以及提出了两种找矿新思路 ;并在实验阶段圈定了 5个矿集区尺度的找矿靶区 ,发现了一批矿化异常区。此外 ,在研究过程中还形成了 3个国家级优秀科研群体和 3个部门级优秀科研群体 ,培养出 9名优秀中青年人才以及大批博士后工作人员、博士研究生和硕士研究生 ,其中有不少中青年科学家已在国际学术组织任职。研究期间共发表科学论文 772篇 ,其中SCI检索论文2 2 7篇 (国外论文 1 2 5篇 )  相似文献   

提出了在名词审定工作中检查定名“系统性”的可靠方法——分词法。利用该方法,可准确发现有相同或相似字符的所有中/英文名。该方法对专业双语词典编纂工作也有很大帮助。  相似文献   

Ever wonder if it is possible to construct a numeric scale for environmental variables, like one does for the temperature? This paper is an attempt to construct one. There are two main parts: section “Statistical Analysis of Variations” presents a general statistical strategy for environmental factor selection. Section “Nonlinear Analytical Geometric Model of Variations” develops an analytical geometric representation of system variations in response to environmental changes. The model is used to quantify the effects of environmental interactions. The paper treats only one-dimensional case, however, the derivation of the case of multiple independent factors follows immediately. The general method developed in this paper may prove applicable to many different fields, such as extensions beyond classical physics, economics, and other sciences. Section “Conclusion” provides an illustration of applications, examples and implications of the results.  相似文献   

探寻公众感知的本质与迭代逻辑   总被引:1,自引:0,他引:1  
德国著名科技哲学的先驱汉斯·赖辛巴赫认为,科学的哲学是人类思想一切形式的逻辑分析,并提出包括情感主义理论学在内的分析哲学,有助于促进科学研究、决策等;而哲学本身是由概念探究组成的,科学技术的进步可以提高人们的认知能力,社会舆情的进步则使人们渴望探寻公众感知的本质。本文在情感、行为、认知的基础上分析了公众感知本质的源、功能、层级效应,并通过情感的递进关系分析了公众感知各组成要素的关系结构及多元迭代关系逻辑,由此,公众感知的本质与迭代逻辑研究开启了新的路径。  相似文献   

牛顿科学劳作背后的形上理念及其方法论架构乃科学史中的元问题之一,也是以往学界乏有论及的盲区。本文注重一种回到事情本身的理路,从牛顿的具体文本出发,撷取出微分定律的工具理性、力之概念的别样意蕴和运动机制的自然本态等三大理论质点,以图彰显牛顿科学纲领的别样意蕴,并澄清以往学界对牛顿的某些误解与误读之处。  相似文献   

伦理矩阵:一种技术评价工具   总被引:1,自引:0,他引:1  
现代技术的应用导致了众多的伦理争议,如何解决这类争议已成为学界日益重视的问题。伦理矩阵作为一种兼顾个人和团体的多元综合评价方法,在转基因技术的伦理评价中已较传统技术评价方法显示出更多的合理性。充分认识这一方法的思想内涵,掌握系统操作程序,有助于我们对技术做出合理评价,对科学决策有积极的帮助。  相似文献   

The importance of mathematics in the context of the scientific and technological development of humanity is determined by the possibility of creating mathematical models of the objects studied under the different branches of Science and Technology. The arithmetisation process that took place during the nineteenth century consisted of the quest to discover a new mathematical reality in which the validity of logic would stand as something essential and central. Nevertheless, in contrast to this process, the development of mathematical analysis within a framework that largely involves intuition and geometry is a fact that cannot go unnoticed amongst the mathematics community, as we shall show in this paper through the research made by Bernhard Riemann on complex variables.  相似文献   

根据基本定义和专门的命名规则,建议统一carborane 的中文定名为“碳硼烷”,C2B10H12 和C2B5H7分别命名为“二碳代-闭式-十二硼烷”(其异构体为“邻-、间-、对碳硼烷”)和“二碳代-闭式-庚硼烷”。  相似文献   

戴黍 《自然辩证法研究》2005,21(10):108-112
中国古代数的观念与治道传统关系极为紧密。受传统治道思想的影响,数的观念在起源与发展过程中,“理性”受到“神性”的抑制;在理论形成阶段,人们注重对“公约”的遵守而采走上“公理化”的道路;而在对数的运用与研究过程中,则过于强调实用、功利,始终采能超越封建政治文化制度,仅以求得特定的方法、技巧为满足,从而陷入“数术”的框架,采能像近代西方那样创设独立的“数学”理论体系。  相似文献   

A new projection-pursuit index is used to identify clusters and other structures in multivariate data. It is obtained from the variance decompositions of the data’s one-dimensional projections, without assuming a model for the data or that the number of clusters is known. The index is affine invariant and successful with real and simulated data. A general result is obtained indicating that clusters’ separation increases with the data’s dimension. In simulations it is thus confirmed, as expected, that the performance of the index either improves or does not deteriorate when the data’s dimension increases, making it especially useful for “large dimension-small sample size” data. The efficiency of this index will increase with the continuously improved computer technology. Several applications are presented.  相似文献   

Certain enterprises at the fringes of science, such as intelligent design creationism, claim to identify phenomena that go beyond not just our present physics but any possible physical explanation. Asking what it would take for such a claim to succeed, we introduce a version of physicalism that formulates the proposition that all available data sets are best explained by combinations of “chance and necessity”—algorithmic rules and randomness. Physicalism would then be violated by the existence of oracles that produce certain kinds of noncomputable functions. Examining how a candidate for such an oracle would be evaluated leads to questions that do not admit an easy resolution. Since we lack any plausible candidate for any such oracle, however, chance-and-necessity physicalism appears very likely to be correct.  相似文献   

一、前言长期以来,特别是近百年来,我国许多学者创立和翻译了大量科技术语。特别是新中国成立后,在党和政府关怀下,许多有志于我国科技术语事业的科学家、翻译家、教育家、语言学家、编辑学家和术语学家、发扬前辈学者“一名之立,旬月踟蹰”精神,为建立和完善我国科技术语事业,呕心沥血地继续创立、翻译和修订了大量科技术语,并对汉语科技术语领域进行了多方面的探讨,为建立中国汉字系统的科技术语学奠定了基础[1-46]。汉语科技术的总数量,难能确切定出。因为它决定于统计的范围和深度,也由于随着科学技术的不断发展,新学科、新术语不断涌现,有些老的术语需要淘汰或更新,所以要确切计算出术语总数量是难以做到的。根据有的学者广泛收集和悉心研究的结果,现在汉语科技术语至少有140万条以上[47]。这样庞大的科技术语群,其订名情况如何,统一与规范化如何,是个普遍关注的大问题。因为它是国家发展科学技术所必需的基础条件之一,是关系到国内外学术交流、教育、科研、经贸、生产和国防等各方面的大问题。同时科技术语状况如何,也是国家科技发展的重要标志。总的看,我国科技术语虽已有自己的独立体系,并发挥着巨大作用,但人们也深深体会到我国有些科技术语还处于不统一、甚至是混乱状态[7,12,13,15-18],也发现不少科技术语的订名尚需进一步改善。不少学者对我国科技术语开展了多方面研究,如从科学概念内涵角度、从术语学角度、从语言学角度、从词源和历史角度开展的研究等。但从科技术语的构词字数角度来探讨汉语科技术语和改善它们,则未见系统研究。而这方面与审定我国科技术语使之规范和完善,有着密切关系。故笔者拟根据现代术语学,对汉语科技术语的构词字数及有关问题,做些探讨。本文仅先以《物理学名词》(基础物理学部分)[60]和《电子学名词》(征求意见稿的修改稿)[61]为例,对此进行初步研究。二、统计与分析结果1.表1、表2分别列出《物理学名词》(基础物理学部分)和《电子学名词》(征求意见稿的修改稿)的术语构词字数及其百分率的统计结果。图1是它们的分布曲线。2.从图1我们可看到,两个学科汉语术语构词字数分布都有个极大值,而且均属4字术语,即4字术语最多(4音节术语最多)。它们占词条总数的百分率亦很相近,皆为30%左右(接近31%)。3.从基础物理学术语的分布曲线可见,曲线在字数较少半边的分布百分率明显地高于字数较多的半边,即由3、2、1字构成的术语明显多于由5、6、7或更多字构成的术语,就是说作为基础学科的物理学基础部分的术语,多为字数较少的术语,3字、2字的很多,1字的术语也不算很少。字数为7字、8字以上的术语的百分率,衰减很快,甚至只是个别的了。1至7字术语总和占术语总数的97.72%。图1 术语构词字数分布曲线4.从电子学术语的分布曲线可见,4字术语两侧曲线几乎是对称分布,由3、2字构成的术语数差不多与由5、6字构成的术语数目相等。1字术语很少,7、8、9字术语不少,10字以上术语也有一定数目。1至7字术语总和占术语总数的95.09%。5.从“物理”和“电子”两条术语分布曲线的对照上可见,曲线的低字数一侧(少于4字),物理学名词的百分率比电子学名词的百分率高;在曲线的多字数一侧(多于4字),电子学名词的术语百分率比物理名词的术语百分率高。就是说作为基础学科的物理学名词(基础物理部分)比作为技术学科的电子学名词,术语构词简短,在术语的简明性方面颇具特色。而随着新技术术语的增加,术语的构词字数明显增多,这可能是目前新技术术语的一个特点(术语较长)。三、讨论1.术语是概念的命名,是其语言符号。人类长期以来通过术语交流科技信息。虽然许多科学家、翻译家等学者为科技概念订名做了巨大贡献,但从现在我国科技界存在的大量术语中,还是可以看到有些术语是不尽妥善的。其中有的是一个概念出现多个术语;有的一个术语对应多个概念;也有的术语名不符其义。不少文献对此做了论述[20、21、48-55、59]。全国自然科学名词审定委员会正在组织科技界各学科专家和术语学家全面进行科技术语的审定,使之达到统一和规范化[2-4、8、30]。一个好的汉语科技术语应具有科学性、系统性、单义性、简明性、汉语特性、国际性和科技语体特性;在审定现有科技术语时,还要考虑其现实性(约定俗成)和协调统一性[25、29]。我在对《物理学名词》(基础物理学部分)和《电子学名词》进行调研时,看到不少经前辈和当代学者精心研究出来的术语是很符合术语的特性的,如衍射、阻尼、激光、电视等。但也有一部分术语我认为仍应进一步改进。2.从表1、表2和图1可见,这两个学科的基本术语中字数在7个字以上的还有不少,物理学名词(基础物理学部分)中有约2%,电子学名词中有约5%。其中有的术语长达14、15、16个字,甚至更多字(电子学中有的术语甚至有20字)。术语字数如此之多,有悖于术语的简明性,不利于口头和书面应用,对我国的科技术语体系也有不良影响。如在不影响术语的科学性、系统性、单义性、汉语特性和约定俗成等特性的前提下,适当使其简化,则应是所求的。考察该二学科术语构词字数多的原因,可有如下几种:(1)用字不够简练。如:自适应雷达(adaptive radar),可简化为“自适雷达”;直接检波式接收机(direct-detection reciver),可简化为“直检式接收机”。(2)含多人名术语,人名全部译出。如:亥姆霍兹—拉格朗日定理(Helmholtz-Lagrange theorem),麦克斯韦—玻耳兹曼分布(Maxwell-Boltzmann distribution)。如分别简化为“亥—拉定理”(或HL定理)和“麦—玻分布”(或MB分布),人们用惯了的话,可起到表达该概念的同等作用,而比较简洁,在口头和书面应用上均比较方便。(3)表达过于详细。如:场效应晶体管(field effect transistor),金属—氧化物—半导体场效应晶体管(metal-oxide-semicon-ductor transistor),低压化学汽相淀积(low pressure chemical vapor deposition)。如它们分别订为“场效晶体管”、“MOS场效晶体管”和“低压CVD”,则可使之简洁。当人们用习惯了,也可同样起到前而复杂表达的作用。在学术交流和书刊文献上用这些简化表达,很有好处(适当采用一此国际通用缩写词,还可改善汉语科技术语的国际性)。3.上面(2)、(3)两点中提出的术语简化方法,我认为它不单单是一般求得文字上的简练,而是跨上了一个订名术语的新台阶。术语的这个订名方法,可称之为合成法。它是以代表概念内涵的少数关键汉字或缩写词加上关键汉字构成术语的方法。著名的术语“激光”,固然是根据其概念内涵订出的,但从某种意义上讲,也可看成是以合成法形成术语的典型代表。“激光”的英文全称是light amplification by the stimulated emission of radiation,译成中文为“光受激辐射放大”或“激射光辐射放大”。后经钱学森先生订名为“激光”。这个订名既是其概念内涵的高明标示,也是上述全称的巧妙合成。类似的还有“崩越二极管”。它的英文名为impact avalanche transit time diode,曾译为“碰撞雪崩渡越时间二极管”。后来将其订名为“崩越二极管”,这不仅仅是全名的简单简化,因为它省略了很多也带有一定意义的字,只取出少数几个关键字表达其概念,形成单义的独特的术语(有别于一般词汇)。这个术语的形成,实际上也是运用了合成法。四、结语因为术语是概念的语言符号,其符号属性决定它不应繁复,而要简明。所以在全面顾及科学性、系统性、单义性、汉语特性和约定俗成等特性之后能使汉语科技术语的构词字数减至最少,是改善汉语科技术语,使之更符合术语学,更便于运用,并使之达到规范与统一的重要问题。订名术语时,不能把术语与定义混为一体,有些术语在简明订名之外,可运用定义使人们对其有清晰了解。对汉语科技术语构词字数的研究,有助于我们对我国汉语科技术语的全面了解,并有助于改进与完善汉语科技术语体系。用合成法订名汉语科技术语,可适当运用于各个学科领域,可在改善汉语科技术语的简明性方面发挥作用。吴凤鸣同志对原稿提出了宝贵意见,在此深致谢意。由于水平有限,文中可能有很多错误和不妥之处,希多加批评指正。  相似文献   

目的论把翻译放在行为理论和跨文化交际中进行考察,强调译文的目的决定翻译的方法、标准和策略。  相似文献   

Reduced K-means (RKM) and Factorial K-means (FKM) are two data reduction techniques incorporating principal component analysis and K-means into a unified methodology to obtain a reduced set of components for variables and an optimal partition for objects. RKM finds clusters in a reduced space by maximizing the between-clusters deviance without imposing any condition on the within-clusters deviance, so that clusters are isolated but they might be heterogeneous. On the other hand, FKM identifies clusters in a reduced space by minimizing the within-clusters deviance without imposing any condition on the between-clusters deviance. Thus, clusters are homogeneous, but they might not be isolated. The two techniques give different results because the total deviance in the reduced space for the two methodologies is not constant; hence the minimization of the within-clusters deviance is not equivalent to the maximization of the between-clusters deviance. In this paper a modification of the two techniques is introduced to avoid the afore mentioned weaknesses. It is shown that the two modified methods give the same results, thus merging RKM and FKM into a new methodology. It is called Factor Discriminant K-means (FDKM), because it combines Linear Discriminant Analysis and K-means. The paper examines several theoretical properties of FDKM and its performances with a simulation study. An application on real-world data is presented to show the features of FDKM.  相似文献   

T clusters, based on J distinct, contributory partitions (or, equivalently, J polytomous attributes). We describe a new model/algorithm for implementing this objective. The method's objective function incorporates a modified Rand measure, both in initial cluster selection and in subsequent refinement of the starting partition. The method is applied to both synthetic and real data. The performance of the proposed model is compared to latent class analysis of the same data set.  相似文献   

