首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
O (n 4), where n is the number of objects. We describe the application of the MVR method to two data models: the weighted least-squares (WLS) model (V is diagonal), where the MVR method can be reduced to an O(n 3) time complexity; a model arising from the study of biological sequences, which involves a complex non-diagonal V matrix that is estimated from the dissimilarity matrix Δ. For both models, we provide simulation results that show a significant error reduction in the reconstruction of T, relative to classical agglomerative algorithms.  相似文献   

2.
Reduced K-means (RKM) and Factorial K-means (FKM) are two data reduction techniques incorporating principal component analysis and K-means into a unified methodology to obtain a reduced set of components for variables and an optimal partition for objects. RKM finds clusters in a reduced space by maximizing the between-clusters deviance without imposing any condition on the within-clusters deviance, so that clusters are isolated but they might be heterogeneous. On the other hand, FKM identifies clusters in a reduced space by minimizing the within-clusters deviance without imposing any condition on the between-clusters deviance. Thus, clusters are homogeneous, but they might not be isolated. The two techniques give different results because the total deviance in the reduced space for the two methodologies is not constant; hence the minimization of the within-clusters deviance is not equivalent to the maximization of the between-clusters deviance. In this paper a modification of the two techniques is introduced to avoid the afore mentioned weaknesses. It is shown that the two modified methods give the same results, thus merging RKM and FKM into a new methodology. It is called Factor Discriminant K-means (FDKM), because it combines Linear Discriminant Analysis and K-means. The paper examines several theoretical properties of FDKM and its performances with a simulation study. An application on real-world data is presented to show the features of FDKM.  相似文献   

3.
In many application fields, multivariate approaches that simultaneously consider the correlation between responses are needed. The tree method can be extended to multivariate responses, such as repeated measure and longitudinal data, by modifying the split function so as to accommodate multiple responses. Recently, researchers have constructed some decision trees for multiple continuous longitudinal response and multiple binary responses using Mahalanobis distance and a generalized entropy index. However, these methods have limitations according to the type of response, that is, those that are only continuous or binary. In this paper, we will modify the tree for univariate response procedure and suggest a new tree-based method that can analyze any type of multiple responses by using GEE (generalized estimating equations) techniques. To compare the performance of trees, simulation studies on selection probability of true split variable will be shown. Finally, applications using epileptic seizure data and WWW data are introduced.  相似文献   

4.
Mokken scale analysis uses an automated bottom-up stepwise item selection procedure that suffers from two problems. First, when selected during the procedure items satisfy the scaling conditions but they may fail to do so after the scale has been completed. Second, the procedure is approximate and thus may not produce the optimal item partitioning. This study investigates a variation on Mokken’s item selection procedure, which alleviates the first problem, and proposes a genetic algorithm, which alleviates both problems. The genetic algorithm is an approximation to checking all possible partitionings. A simulation study shows that the genetic algorithm leads to better scaling results than the other two procedures.  相似文献   

5.
20世纪90年代末以来,我国科技史学科不断拓展研究方向,积极适应社会需求,在研究方向、学术问题、研究范式、国际合作等方面经历着一个转变期。研究领域发生“从传统到现代、从中国到世界”的拓展,开辟传统工艺与科技考古、科技发展战略及相关理论、科学与人文的整合、科研机构史、中外科技发展比较等应用和交叉方向。我国学者更加注重研究新的学术问题,更多地借鉴哲学、社会学、科学技术研究、考古学、人类学和民俗学等学科的理论与方法,采用先进的信息技术与实验手段,尝试跨学科、跨文化的团队式国际合作研究。  相似文献   

6.
科学社会学视野中的科技传播和知识创新   总被引:2,自引:1,他引:2  
长期以来,科技传播仅被视为既有知识的传递扩散,而未将知识创新与之加以结合考虑。本文以科学社会学为理论视角,认为科技传播是知识创新活动不可或缺的重要组分,知识创新也是科技传播的重要目标,力图为促进科技传播研究提供新的理论思考。  相似文献   

7.
One key point in cluster analysis is to determine a similarity or dissimilarity measure between data objects. When working with time series, the concept of similarity can be established in different ways. In this paper, several non-parametric statistics originally designed to test the equality of the log-spectra of two stochastic processes are proposed as dissimilarity measures between time series data. Their behavior in time series clustering is analyzed throughout a simulation study, and compared with the performance of several model-free and model-based dissimilarity measures. Up to three different classification settings were considered: (i) to distinguish between stationary and non-stationary time series, (ii) to classify different ARMA processes and (iii) to classify several non-linear time series models. As it was expected, the performance of a particular dissimilarity metric strongly depended on the type of processes subjected to clustering. Among all the measures studied, the nonparametric distances showed the most robust behavior.  相似文献   

8.
空化与空蚀研究   总被引:2,自引:0,他引:2  
空化是一种自然现象,从认识"滴水穿石"起,人们就将注意力集中在源于空化的各种损伤过程上。由于对空泡生成、坍缩、溃灭,直至形成微激波、微射流的机理尚不清楚,历经百余年的研究,仍然没有形成有效解决空蚀损伤、空蚀噪声等问题的关键技术。另一方面,空泡坍缩、溃灭过程所形成的极端物理、化学、力学环境、空泡内部物质的特殊物理化学状态及其转化过程,可为寻找自然界深层次规律的科学研究提供新的途径,形成的关键技术将为国民经济与国家安全的发展做出巨大贡献,并将最终造福于人类。  相似文献   

9.
采用正电子湮灭技术和金相显微以及扫描电镜能谱技术对南唐永通泉货铁钱和若干南北宋铁钱进行了研究。结果表明,早在南唐时期,我国铁币冶铸技术已达到一定水平,且在宋代有所改进。测试表明,正电子湮灭技术结合其它测试方法,对于古代铁币的结构研究以及判别它们的真伪提供了科学的测试手段。  相似文献   

10.
Incremental Classification with Generalized Eigenvalues   总被引:2,自引:0,他引:2  
Supervised learning techniques are widely accepted methods to analyze data for scientific and real world problems. Most of these problems require fast and continuous acquisition of data, which are to be used in training the learning system. Therefore, maintaining such systems updated may become cumbersome. Various techniques have been devised in the field of machine learning to solve this problem. In this study, we propose an algorithm to reduce the training data to a substantially small subset of the original training data to train a generalized eigenvalue classifier. The proposed method provides a constructive way to understand the influence of new training data on an existing classification function. We show through numerical experiments that this technique prevents the overfitting problem of the earlier generalized eigenvalue classifiers, while promising a comparable performance in classification with respect to the state-of-the-art classification methods.  相似文献   

11.
Optimization Strategies for Two-Mode Partitioning   总被引:2,自引:2,他引:0  
Two-mode partitioning is a relatively new form of clustering that clusters both rows and columns of a data matrix. In this paper, we consider deterministic two-mode partitioning methods in which a criterion similar to k-means is optimized. A variety of optimization methods have been proposed for this type of problem. However, it is still unclear which method should be used, as various methods may lead to non-global optima. This paper reviews and compares several optimization methods for two-mode partitioning. Several known methods are discussed, and a new fuzzy steps method is introduced. The fuzzy steps method is based on the fuzzy c-means algorithm of Bezdek (1981) and the fuzzy steps approach of Heiser and Groenen (1997) and Groenen and Jajuga (2001). The performances of all methods are compared in a large simulation study. In our simulations, a two-mode k-means optimization method most often gives the best results. Finally, an empirical data set is used to give a practical example of two-mode partitioning. We would like to thank two anonymous referees whose comments have improved the quality of this paper. We are also grateful to Peter Verhoef for providing the data set used in this paper.  相似文献   

12.
Multiple imputation is one of the most highly recommended procedures for dealing with missing data. However, to date little attention has been paid to methods for combining the results from principal component analyses applied to a multiply imputed data set. In this paper we propose Generalized Procrustes analysis for this purpose, of which its centroid solution can be used as a final estimate for the component loadings. Convex hulls based on the loadings of the imputed data sets can be used to represent the uncertainty due to the missing data. In two simulation studies, the performance of Generalized Procrustes approach is evaluated and compared with other methods. More specifically it is studied how these methods behave when order changes of components and sign reversals of component loadings occur, such as in case of near-equal eigenvalues, or data having almost as many counterindicative items as indicative items. The simulations show that other proposed methods either may run into serious problems or are not able to adequately assess the accuracy due to the presence of missing data. However, when the above situations do not occur, all methods will provide adequate estimates for the PCA loadings.  相似文献   

13.
Recent Advances in Predictive (Machine) Learning   总被引:1,自引:0,他引:1  
Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.  相似文献   

14.
本文基于美国专利全文数据库(US—PTO)的中国专利数据,对专利引文中的科技期刊论文的时间分布特征进行研究,选择对数正态分布模型定量化描述专利的期刊引文年龄分布。拟合效果较好:引入最大引文年龄和平均引文年龄两个参数反映专利与其引用的期刊论文之间的时间关系。  相似文献   

15.
We propose a new nonparametric family of oscillation heuristics for improving linear classifiers in the two-group discriminant problem. The heuristics are motivated by the intuition that the classification accuracy of a separating hyperplane can be improved through small perturbations to its slope and position, accomplished by substituting training observations near the hyperplane for those used to generate it. In an extensive simulation study, using data generated from multivariate normal distributions under a variety of conditions, the oscillation heuristics consistently improve upon the classical linear and logistic discriminant functions, as well as two published linear programming-based heuristics and a linear Support Vector Machine. Added to any of the methods above, they approach, and frequently attain, the best possible accuracy on the training samples, as determined by a mixed-integer programming (MIP) model, at a much smaller computational cost. They also improve expected accuracy on the overall populations when the populations overlap significantly and the heuristics are trained with large samples, at least in situations where the data conditions do not explicitly favor a particular classifier.  相似文献   

16.
Recent convergence results for the fuzzy c-means clustering algorithms   总被引:1,自引:0,他引:1  
One of the main techniques embodied in many pattern recognition systems is cluster analysis — the identification of substructure in unlabeled data sets. The fuzzy c-means algorithms (FCM) have often been used to solve certain types of clustering problems. During the last two years several new local results concerning both numerical and stochastic convergence of FCM have been found. Numerical results describe how the algorithms behave when evaluated as optimization algorithms for finding minima of the corresponding family of fuzzy c-means functionals. Stochastic properties refer to the accuracy of minima of FCM functionals as approximations to parameters of statistical populations which are sometimes assumed to be associated with the data. The purpose of this paper is to collect the main global and local, numerical and stochastic, convergence results for FCM in a brief and unified way.  相似文献   

17.
The additive biclustering model for two-way two-mode object by variable data implies overlapping clusterings of both the objects and the variables together with a weight for each bicluster (i.e., a pair of an object and a variable cluster). In the data analysis, an additive biclustering model is fitted to given data by means of minimizing a least squares loss function. To this end, two alternating least squares algorithms (ALS) may be used: (1) PENCLUS, and (2) Baier’s ALS approach. However, both algorithms suffer from some inherent limitations, which may hamper their performance. As a way out, based on theoretical results regarding optimally designing ALS algorithms, in this paper a new ALS algorithm will be presented. In a simulation study this algorithm will be shown to outperform the existing ALS approaches.  相似文献   

18.
纳滤膜技术作为一种新型的膜分离技术,已经在水处理和过程分离领域得到了广泛应用,并成为我国未来科技发展的专项技术。本研究以SCI讲论文和专利这两种重要的科技产出形式为主要研究对象,通过对这些文献进行计量学分析,主要研究了纳滤膜技术在基础研究和技术创新方面的发展情况,并通过国际对比,研究了我国在该项技术领域的发展特点,研究结果可为相关的科技发展规划提供政策支撑。  相似文献   

19.
极端微生物是丰富的资源宝库,有着巨大的生物技术开发前景。本文从特殊功能蛋白质的发现与表征、结构与功能及潜在性应用.以及特殊功能蛋白质的规模化制备技术两个方面,对极端微生物的资源开发与利用进行了探讨。重点介绍了极端嗜热古菌Pyrococcus furiosus分子伴侣蛋白系统的一些研究,并以P.furiosus胞外α-淀粉酶PFA为例,对工业生物技术领域重组蛋白质的规模化制备技术进行了讨论。  相似文献   

20.
A validation study of a variable weighting algorithm for cluster analysis   总被引:1,自引:0,他引:1  
De Soete (1986, 1988) proposed a variable weighting procedure when Euclidean distance is used as the dissimilarity measure with an ultrametric hierarchical clustering method. The algorithm produces weighted distances which approximate ultrametric distances as closely as possible in a least squares sense. The present simulation study examined the effectiveness of the De Soete procedure for an applications problem for which it was not originally intended. That is, to determine whether or not the algorithm can be used to reduce the influence of variables which are irrelevant to the clustering present in the data. The simulation study examined the ability of the procedure to recover a variety of known underlying cluster structures. The results indicate that the algorithm is effective in identifying extraneous variables which do not contribute information about the true cluster structure. Weights near 0.0 were typically assigned to such extraneous variables. Furthermore, the variable weighting procedure was not adversely effected by the presence of other forms of error in the data. In general, it is recommended that the variable weighting procedure be used for applied analyses when Euclidean distance is employed with ultrametric hierarchical clustering methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号