首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
NP-hard Approximation Problems in Overlapping Clustering   总被引:1,自引:1,他引:0  
Lp -norm (p < ∞). These problems also correspond to the approximation by a strongly Robinson dissimilarity or by a dissimilarity fulfilling the four-point inequality (Bandelt 1992; Diatta and Fichet 1994). The results are extended to circular strongly Robinson dissimilarities, indexed k-hierarchies (Jardine and Sibson 1971, pp. 65-71), and to proper dissimilarities satisfying the Bertrand and Janowitz (k + 2)-point inequality (Bertrand and Janowitz 1999). Unidimensional scaling (linear or circular) is reinterpreted as a clustering problem and its hardness is established, but only for the L 1 norm.  相似文献   

2.
3.
4.
Traditional techniques of perceptual mapping hypothesize that stimuli are differentiated in a common perceptual space of quantitative attributes. This paper enhances traditional perceptual mapping techniques such as multidimensional scaling (MDS) which assume only continuously valued dimensions by presenting a model and methodology called CLUSCALE for capturing stimulus differentiation due to perceptions that are qualitative, in addition to quantitative or continuously varying perceptual attributes or dimensions. It provides models and OLS parameter estimation procedures for both a two-way and a three-way version of this general model. Since the two-way version of the model and method has already been discussed by Chaturvedi and Carroll (2000), and a stochastic variant discussed by Navarro and Lee (2003), we shall deal in this paper almost entirely with the three-way version of this model. We recommend the use of the three-way approach over the two-way approach, since the three-way approach both accounts for and takes advantage of the heterogeneity in subjects’ perceptions of stimuli to provide maximal information; i.e., it explicitly deals with individual differences among subjects.  相似文献   

5.
K-modes Clustering   总被引:2,自引:0,他引:2  
0 norm (defined as the limit of an Lp norm as p approaches zero). In Monte Carlo simulations, both K-modes and the latent class procedures (e.g., Goodman 1974) performed with equal efficiency in recovering a known underlying cluster structure. However, K-modes is an order of magnitude faster than the latent class procedure in speed and suffers from fewer problems of local optima than do the latent class procedures. For data sets involving a large number of categorical variables, latent class procedures become computationally extremly slow and hence infeasible. We conjecture that, although in some cases latent class procedures might perform better than K-modes, it could out-perform latent class procedures in other cases. Hence, we recommend that these two approaches be used as "complementary" procedures in performing cluster analysis. We also present an empirical comparison of K-modes and latent class, where the former method prevails.  相似文献   

6.
The notion of defining a cluster as a component in a mixture model was put forth by Tiedeman in 1955; since then, the use of mixture models for clustering has grown into an important subfield of classification. Considering the volume of work within this field over the past decade, which seems equal to all of that which went before, a review of work to date is timely. First, the definition of a cluster is discussed and some historical context for model-based clustering is provided. Then, starting with Gaussian mixtures, the evolution of model-based clustering is traced, from the famous paper by Wolfe in 1965 to work that is currently available only in preprint form. This review ends with a look ahead to the next decade or so.  相似文献   

7.
Clustering Functional Data   总被引:1,自引:0,他引:1  
  相似文献   

8.
全国自然科学名词审定委员会名词术语审定的原则及方法第2.3.2则要求:“不易分清主、副科关系的一部分交叉词,几个学科要互相协调,统一订名。”地学各分支学科之间部分交叉词长期未能统一。现趁各学科名词审定的机会,在全国自然科学名词审定委员会的领导下,地学各分支学科,包括地理学、地质学、地球物理学、海洋学、土壤学、气象学、测绘学等七个学科的分委员会,于1987年和1988年召开了多次学科间的协调会,本着各抒己见、民主协商、着眼于长远,要统一订正的名词尽可能统一的精神,对各学科共同使用的一些名词术语进行协调,使部分术语得到统一;部分名词作了订正;另有少部分因种种原因未能取得一致意见,有关学科仍保留其习用的称谓。一、经过协商,使大部分交叉词得到统一。属于这类词的有喀斯特、判读、大陆架、海拔、地貌学等20余个名词。例如喀斯特原系Karst的音译,指石灰岩等岩石被溶蚀而形成的一种地貌类型,因南斯拉夫的Karst山得名,并被国际地理学界所接受,我国亦一直采用此词的音译。我国地质界倾向于采用意译为“岩溶”,并在1966年召开的一次岩溶学术会议上被广泛接受。此后在广西建立的研究机构也采用了“岩溶研究所”的名称。这次审定过程中地理界提出采用原先的音译较好,既照顾到国际上的习惯用法,又可包括黄土和冰川地区类似的现象。例如用“热喀斯特”就比“热岩溶”恰当。协调会上多数委员同意这个意见。但地质学名词审定工作仍在进行中,如果将来多数委员坚持称岩溶,则以“喀斯特,又称岩溶”处理。又如对遥感图象的判读(Interpretation),曾先后采用过判读、解译、解释、识别等多种称谓。现经地理与测绘分委员会协商,拟采用“判读”。二、纠正了过去的讹误:典型的例子是许多人长期以来将“潟(音细)湖”误作“泻湖”。原来的繁体“瀉”字与“潟”字形同音近而歧义。据查,“泻”系液体快流之意,如“一泻千里”;而“潟”则是咸水浸渍的土地。近海地区海水浸漫之湖理当称“潟湖”,而且中小学地理教科书中已正确使用此词。这次审定过程中,地理分委员会坚持去讹从正,并在协调会上一致通过。三、对各分支学科用法不一,但因长期习用,暂时尚无法统一的,则暂且存异。典型的例子是地理名词中的“亚热带”与气象名词中的“副热带”,同是对应是英文Subtropical Zone。尽管“亚”与“副”在中文中含义有所区别:“亚”有等级的差别,如冠军高于亚军;而“副”有主从之别,如正业、副业,正主席、副主席等。协调会多数人同意称“亚热带”,但气象界长期使用“副热带”一词,并且“副热带高压”在气象预报上常简称为“副高”;一旦改为“亚高”很难被接受。因此在这两个学科的名词中分别采用各自的习惯用法,而加注“又称”。又如地质界地理界习用“大气圈”、“岩石圈”,而气象界、地球物理界习用“大气层”、“岩石层”;地质界习用“亚粘土”,地理界、土壤界习用“壤土”;地理界习用“地图学”,测绘界习用“地图制图学”(或简称为“制图学”)。这些名词,虽经协商,一时尚难统一,有待将来经过更长时间的使用方能逐步取得一致意见。总之,通过学科间的协调,使不少同义异名的词得到统一,并纠正了少数误用的词。也还有一些名词各有关学科未能取得一致的意见,有待将来逐步统一。通过这项工作,我们觉得协调会是一种很好的形式,使相关的学科有机会相互交流,增进彼此的了解,解决了不少问题。而全国自然科学名词审定委员会恰是这种协调会最恰当的组织者和领导者。同时,我们认为协调工作仍需加强,力求做到协调统一,不再存异。这一方面要求有关学科克服困难,放弃习用多年的称谓;另一方面需要加强领导,有些意见不一致的交叉词可由全国自然科学名词审定委员会裁定,强行统一。个别学科的同志在几年内可能存在一些困难,但对后来者将是莫大的方便。事实上国务院于1984年2月27日发布了《关于在我国统一实行法定计量单位的命令》中废除了我们习用多年的“达因”、“尔格”、“埃()”、“巴”、“卡”、“克当量”等许多单位,我们这一代人虽然不习惯,但也在写作、出版时遵照这个规定执行了。难道上述地学名词稍作改变的难度比这些单位的彻底废除还要大吗?现在各持己见沿用下去,多少年以后还是要统一,与其让将来更多的人感到困难,不如由我们这一代人来克服它。因此,我们建议交叉词的协调在充分讨论协商的基础上如仍有分歧,则应由全国名委会予以裁定,有关学科应遵照统一裁定的称谓使用。做到现在就统一,不再等到将来。  相似文献   

9.
10.
A Note on K-modes Clustering   总被引:2,自引:0,他引:2  
Recently, Chaturvedi, Green and Carroll (2001) presented a nonparametric approach to deriving clusters from categorical data using a new clustering procedure called K-modes. Huang (1998) proposed the K-modes clustering algorithm. In this note, we demonstrate the equivalence of the two K-modes procedures.  相似文献   

11.
12.
13.
Functional data sets appear in many areas of science. Although each data point may be seen as a large finite-dimensional vector it is preferable to think of them as functions, and many classical multivariate techniques have been generalized for this kind of data. A widely used technique for dealing with functional data is to choose a finite-dimensional basis and find the best projection of each curve onto this basis. Therefore, given a functional basis, an approach for doing curve clustering relies on applying the k-means methodology to the fitted basis coefficients corresponding to all the curves in the data set. Unfortunately, a serious drawback follows from the lack of robustness of k-means. Trimmed k-means clustering (Cuesta-Albertos, Gordaliza, and Matran 1997) provides a robust alternative to the use of k-means and, consequently, it may be successfully used in this functional framework. The proposed approach will be exemplified by considering cubic B-splines bases, but other bases can be applied analogously depending on the application at hand.  相似文献   

14.
Given a set of pairwise distances on a set of n points, constructing an edgeweighted tree whose leaves are these n points such that the tree distances would mimic the original distances under some criteria is a fundamental problem. One such criterion is to preserve the ordinal relation between the pairwise distances. The ordinal relation can be of the form of total order on the distances or it can be some partial order specified on the pairwise distances. We show that the problem of finding a weighted tree, if it exists, which would preserve the total order on pairwise distances is NP-hard. We also show the NP-hardness of the problem of finding a weighted tree which would preserve a particular kind of partial order called a triangle order, one of the most fundamental partial orders considered in computational biology.  相似文献   

15.
16.
Variable Selection for Clustering and Classification   总被引:2,自引:2,他引:0  
As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering algorithms are based upon determining the best variable subspace according to model fitting in a stepwise manner. These techniques are often computationally intensive and can require extended periods of time to run; in fact, some are prohibitively computationally expensive for high-dimensional data. In this paper, a novel variable selection technique is introduced for use in clustering and classification analyses that is both intuitive and computationally efficient. We focus largely on applications in mixture model-based learning, but the technique could be adapted for use with various other clustering/classification methods. Our approach is illustrated on both simulated and real data, highlighted by contrasting its performance with that of other comparable variable selection techniques on the real data sets.  相似文献   

17.
Traditional procedures for clustering time series are based mostly on crisp hierarchical or partitioning methods. Given that the dynamics of a time series may change over time, a time series might display patterns that may enable it to belong to one cluster over one period while over another period, its pattern may be more consistent with those in another cluster. The traditional clustering procedures are unable to identify the changing patterns over time. However, clustering based on fuzzy logic will be able to detect the switching patterns from one time period to another thus enabling some time series to simultaneously belong to more than one cluster. In particular, this paper proposes a fuzzy approach to the clustering of time series based on their variances through wavelet decomposition. We will show that this approach will distinguish between time series with different patterns in variability as well identifying time series with switching patterns in variability.  相似文献   

18.
The additive clustering approach is applied to the problem of two-mode clustering and compared with the recent error-variance approach of Eckes and Orlik (1993). Although the schemes of the computational algorithms look very similar in both of the approaches, the additive clustering has been shown to have several advantages. Specifically, two technical limitations of the error-variance approach (see Eckes and Orlik 1993, p. 71) have been overcome in the framework of the additive clustering. The research was supported by the Office of Naval Research under grant number N0014-93-1-0222 to Rutgers University. The authors are indebted both to Fionn Murtagh, who served as Acting Editor, and to anonymous Referees for thoughtful and constructive reviews.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号