期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A rough granular computing in discovery of process models from data and domain knowledge

NGUYEN Hung Son SKOWRON Andrzej 《重庆邮电大学学报(自然科学版)》2008,20(3):341-347

The rapid expansion of the Internet has resulted not only in the ever growing amount of data therein stored, but also in the burgeoning complexity of the concepts and phenomena pertaining to those data. This issue has been vividly compared by the renowned statistician, prof. Friedman of Stanford University, to the advances in human mobility from the period of walking afoot to the era of jet travel. These essential changes in data have brought new challenges to the development of new data mining methods, especially that the treatment of these data increasingly involves complex processes that elude classic modeling paradigms. “Hot” datasets like biomedical, financial or net user behavior data are just a few examples. Mining such temporal or stream data is on the agenda of many research centers and companies worldwide. In the data mining community, there is a rapidly growing interest in developing methods for process mining, e.g〖DK〗., for discovery of structures of temporal processes from data. Works on process mining have recently been undertaken by many renowned centers worldwide. This research is also related to functional data analysis , cognitive networks , and dynamical system modeling, e.g., in biology. In the lecture, we outline an approach to discovery of processes from data and domain knowledge which is based on the roughgranular computing. 相似文献

2.

Integration of Sensor Network Synthesis and Steady-State Online Data Reconciliation

李博陈丙珍《清华大学学报》2002,7(1)

IntroductionWe need to analyze,simulate,optimize,control,and upgrade the chemical processes to increaseprofit.The basis of these efforts is the processdata,and the validity of process data has directinfluence on the efficiency of these efforts.Generally,the validity of process data refers to:( 1 ) The observability of variables,which meansthat a variable can be measured directly or beestimated through the process constraints[1] ;( 2 )The reliability of estimating a variable,which isdefined as… 相似文献

3.

Incremental frequent tree-structured pattern mining from semi-structured data

ChenEnhong LinLe WuGongqing 《高技术通讯(英文版)》2005,11(1):6-8

The paper studies the problem of incremental pattern mining from semi-structrued data. When a new dataset is added into the original dataset, it is difficult for existing pattern mining algorithms to incrementally update the mined results. To solve the problem, an incremental pattern mining algorithm based on the rightmost expansion technique is proposed here to improve the mining performance by utilizing the original mining results and information obtained in the previous mining process. To improve the efficiency, the algorithm adopts a pruning technique by using the frequent pattern expansion forest obtained in mining processes. Comparative experiments with different volume of initial datasets, incremental datasets and different minimum support thresholds demonstrate that the algorithm has a great improvement in the efficiency compared with that of non-incremental pattern mining algorithm. 相似文献

4.

Advances in G-protein coupled receptor research and related bioinformatics study 总被引：1，自引：0，他引：1

YINYanbin LUOJingchu 《科学通报(英文版)》2003,48(6):511-516

G-protein coupled recptor(GPCR) is one of the most important protein families for drug target.GPCR agonists and antagonists occupy approximately one third of the world small molecule drug market,Much effort has been invested in GPCR study by both academic institutions and pharmaceutical industries,With seven-transmembrane domains,GPCR plays significant roles in intercellular signal transduction and is involved in a variety of biological pathways.With the availability of sequence data of human and other mammalian genomes,as well as their expressed sequence tag (EST) data,the bioniformatics and genomics approaches can be applied to identifying novel GPCR in the post genomic era .Deorphanizing GPCR or matching ligands with GPCR greatly faciltiates traget validation process and automatically provides a possible compound screening assay ,Similarly,Bioinformatics data mining approach could also be applied to the indentification of GPCR peptided or protein ligands,Here we give a general review of recent advances in the study of GPCR structure,function ,as well as GPCR and ligand identification with the emphasis on the bioinformatics database mining of GPCR and their peptide of protein ligands. 相似文献

5.

Mining Rules from Electrical Load Time Series Data Set

郑斌祥 Xi Yugen Du Xiuhua Li Shaoyuan 《高技术通讯(英文版)》2002,8(1):41-45

The mining of the rules from the electrical load time series data which are collected from the EMS(Energy Management System)is discussed.The data from the EMS are too huge and sophisticated to be understood and used y the power system engineer,while useful information is hidden in the electrical load data,The authors discuss the use of fuzzy linguistic summary as data mining method to induce the rules from the electrical load time series.The data preprocessing techniques are also discussed in the paper. 相似文献

6.

Reliability Allocation of Large Mining Excavator Electrical System Based on the Entropy Method with Failure and Maintenance Data

胡天松黄洪钟王晓明李光高素荷《东华大学学报(英文版)》2014,(6):779-781

In the design of large mining excavator electrical system,a practical reliability allocation method was introduced to allocate system level reliability requirements into subsystem and component levels. During the reliability allocation process,factors from the fault and maintenance data were only considered in reliability allocation scheme. It could avoid the disturbance from expert experiences. The entropy method was also used to obtain weights of reliability allocation indexes of large mining excavators considering different factors. Then the failure rate allocation of subsystems and components could be completed. 相似文献

7.

Data Mining for Quality Prediction in Textile Engineering

杨建国李蓓智赵亚梅《东华大学学报(英文版)》2006,23(2):88-91

A data mining method for quality prediction using association rule （DMAR） is presented in this paper. Association rule is used to mine the valuable relations of items among amounts of textile process data for ANN prediction model. DMAR consists of three main steps： setup knowledge data set; data cleaning and converting; find the item set with large supports and generate the expected rules. DMAR effectively improves the precision of prediction in yarn breaking. It rapidly gets rid of the negative influence of training parameters on prediction model. Then more satisfactory quality prediction result can be reached. 相似文献

8.

Image Post-Processing Method for Visual Data Mining

REN Yong-gong YU Ge 《武汉大学学报:自然科学英文版》2006,11(1):15-20

Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM （resemble neighborhood averaging method）, to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining. 相似文献

9.

Parallel Frequent Pattern Discovery： Challenges and Methodology

张宇宙王建勇周立柱《清华大学学报》2007,12(6):719-728

Parallel frequent pattern discovery algorithms exploit parallel and distributed computing resources to relieve the sequential bottlenecks of current frequent pattern mining （FPM） algorithms. Thus, parallel FPM algorithms achieve better scalability and performance, so they are attracting much attention in the data mining research community. This paper presents a comprehensive survey of the state-of-the-art parallel and distributed frequent pattern mining algorithms with more emphasis on pattern discovery from complex data （e.g., sequences and graphs） on various platforms. A review of typical parallel FPM algorithms uncovers the major challenges, methodologies, and research problems in the field of parallel frequent pattern discovery, such as work-load balancing, finding good data layouts, and data decomposition. This survey also indicates a dramatic shift of the research interest in the field from the simple parallel frequent itemset mining on traditional parallel and distributed platforms to parallel pattern mining of more complex data on emerging architectures, such as multi-core systems and the increasingly mature grid infrastructure. 相似文献

10.

MICkNN: Multi-Instance Covering kNN Algorithm 总被引：1，自引：0，他引：1

Shu Zhao Chen Rui Yanping Zhang 《清华大学学报》2013,18(4):360-368

Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time. 相似文献

11.

分布式数据挖掘综述

下载免费PDF全文

刘滨《河北科技大学学报》2014,35(1):80-90

随着网络技术、通信技术等的不断突破,互联网、移动网、广电网等多种类现代网络及其衍生业务迅速扩张,形成泛在于网络空间的分布式计算环境。为了最大化这些数据的价值,需要利用数据挖掘技术发现其中隐藏的模式或规则,用以指导和辅助生产或运营中的管理决策行为,以提高决策水平及决策收益。然而,受到普遍存在的异构性、私有性和平台兼容性等限制,兼因行业竞争和法律约束等因素(如个人或企业的数据隐私保护问题等),互联于网络的数据源难于进行集中式挖掘,分布式数据挖掘(DDM)技术应运而生。介绍了DDM的定义与框架、适用场景和研究挑战。根据文中给出的DDM高层架构,最终结果的质量与局部数据源的类型、可用性、局部结果的质量及整合方法等密切相关。DDM的实施未必都以站点间纯粹独立挖掘的方式进行,此外,对于数据集中,系统分布式站点多的情况,也可采用DDM。当前,DDM研究领域的挑战主要有:异构与同构挖掘、动态环境下的数据多变性、通信开销、知识整合和语义异构等。当前的DDM系统被分为4类:1)基于Multi-Agent的系统,利用Agent的自治性实现局部挖掘以保护数据私有性;利用Agent的主动性减少用户参与以提高挖掘自动化水平;利用Agent的协作性实现多算法协同挖掘等;2)基于网格的系统,利用网格在资源共享、开放服务和协同工作等方面的优势,提高挖掘的可靠性和协同性;3)基于元学习的系统,通过元学习优化挖掘算法的选择与组合,并对已获知识进行多次学习以提高结果质量;4)基于CDM(collective data mining)框架的系统,将待学习的函数表示为一组基函数的分布式存在,允许各数据源选择不同的学习算法,并以全局结果正确为前提减少网络通信量。进而,对当前DDM研究存在的共性问题进行了归纳:1)结果质量问题,不考虑各个站点数据源间的内在语义联系,各站点独立挖掘本地数据,与其他站点间无语义层面的数据交互或融合,形成纯粹的"分割式"挖掘,最终导致全局结果质量受损;2)挖掘效率问题,如何调度资源以平衡挖掘负载、减少协作挖掘中的通信开销问题。针对结果质量问题,探讨本体与数据挖掘的结合。作为语义网的基础,本体能为对象语义距离度量提供有效支持。当前,在利用本体描述挖掘任务的领域背景方面,利用本体描述DM过程本身方面,都已经有研究人员进行了探索性工作:针对关联规则挖掘中需要从海量规则中遴选有效规则的问题,提出了交互式的、用于删减冗余规则的挖掘后处理方法;针对在给定知识发现过程的输入和输出类型前提下,知识发现工作流的自动构造问题,提出了解决方法。通过阐述可知,为了提升分布式挖掘过程中局部结果和最终结果的质量,策略之一就是将DDM理论和本体理论作融合,以数据源间语义距离的度量为突破口,建立语义距离度量的复合量化体系,通过构建和求解新型DDM模型来实现目标。相似文献

12.

基于隐私保护的语义数据集成

李玉华卢正鼎孙小林李瑞轩《华中科技大学学报(自然科学版)》2005,33(Z1):128-130

设计隐私保护策略本体,提出一个基于智能体和本体的数据集成的架构,包括知识浏览器、全局本体、局部本体、映射和上下文、隐私保护知识库、隐私策略本体、数据挖掘本体、数据挖掘智能体、集成智能体等,可在保护用户隐私的基础上有效地实现分布异构环境的语义数据集成.并介绍了一个反洗钱领域隐私保护数据集成的实例. 相似文献

13.

基于本体的关系型数据库集成与应用

梁晔鲍泓刘宏哲《北京联合大学学报(自然科学版)》2008,22(2):19-24

本体越来越多地被用于异构信息的集成。本体整合大体上有三个不同的方向:单本体方法、多本体方法和混合本体方法。提出一个基于混合本体的信息集成框架,此框架能够支持两种查询:基于全局本体的查询和基于局部数据源的查询,且详细地论述了两种类型查询的实现原理和过程。此外,将此框架应用于分布式的数字博物馆中,证明了方法的有效性。相似文献

14.

Data Mining Ontology Development for High User Usability 总被引：1，自引：0，他引：1

LI Yu-hua LU Zheng-ding SUN Xiao-lin WEN Kun-mei LI Rui-xuan 《武汉大学学报:自然科学英文版》2006,11(1):51-56

This paper mainly introduces the development and implementation of the user centered data mining service ontology on Universal Knowledge Grid （UKG）. UKG is an ontology-based grid architecture model to build large-scale distributed knowledge discovery system on the grid. The data mining ontology services are the main service offering by UKG. It can meet the user requirements of knowledge discovery in different domains and different hierarchies and make the system exoteric, extensible and high usable. A data min- ing solution for money laundering is introduced. 相似文献

15.

本体支持下的企业数据模型构建 总被引：1，自引：0，他引：1

苗虹葛世伦《清华大学学报(自然科学版)》2006,46(Z1):1131-1137

为解决企业级建模缺乏标准稳定的数据环境的问题,该文根据本体论思想、统一建模语言(UM L)和关系数据模型三者间的一致性,提出了本体支持下的企业数据模型构建方法。划分了三类本体,形成了需求分析的新框架;本体中六元组定义和情景演算方法首先从概念语义层面上分静态和动态建立了企业领域中的概念模型,奠定了逻辑基础;借助UM L及其建模工具进行图形化的直观表达,获得业务逻辑模型,并易于转换为面向关系数据库的数据模型。此外,该文同时应用本体的求解过程对模型的评价和维护作了尝试。相似文献

16.

基于数据挖掘的ontology应用框架

陈锋郭禾代莉王宇新杨宏戟《大连理工大学学报》2003,43(Z1):142-145

提出了一个通用数据挖掘系统框架(GDMF)模型.其目的是为了能够从数据挖掘应用中抽取出核心功能并将其应用到可重用可扩展的原型系统中,以便快速地建造数据挖掘应用系统.在GDMF中ontology被用做语义数据模型.通过使用ontology驱动的数据挖掘查询语言,用户能够很轻松地表达一些复杂查询.最后,给出了使用GDMF作为一个建模工具去设计数据挖掘系统的方法. 相似文献

17.

网格计算环境下的分布式数据挖掘 总被引：4，自引：0，他引：4

江舞山俞集辉《重庆大学学报(自然科学版)》2006,29(11):49-52

为了提高分布式挖掘系统的性能，分析了现有的分布式数据挖掘系统的不足，提出了一种网格计算环境下的分布式数据挖掘的体系结构，讨论了在该体系结构下如何进行数据挖掘．该体系结构是面向服务的，跨平台的，在该体系结构中，挖掘算法和目标数据源被定义为web服务资源，在需要进行数据挖掘的时候，这些web服务资源被动态地、松耦合地联接在一起，共同完成一次数据挖掘任务．最后，借助网格工具globus toolkit4．0构建了一个局域网网格计算环境，并使用一个关联规则挖掘实例详细地说明了在该体系结构下的挖掘过程，相似文献

18.

A Framework of Semantic Information Representation in Distributed Environments

ZHANG Lin CHEN He-ping 《武汉大学学报:自然科学英文版》2006,11(1):57-62

0 IntroductionOonft thhee gcloombaml uinnifcoartimoanti onentiwnofrraksst rhuacstu mread,et ahvea iglraebalte teox puasnesrisolarge number of autonomous data repositories ,however ,these repositories present different structures and semantics result fromthat distinct data sources may use different modeling methodsmaking very difficult to share and exchangeinformation.Recentlythere have been some works ,such as OntoBroker in Ref .[1] ansystemin Ref .[2] ,that focus on the umformrepresentation … 相似文献

19.

基于本体的医疗信息整合框架的设计与应用 总被引：1，自引：0，他引：1

白杰英李万龙郑山红《科技信息》2009,(29):I0085-I0086

随着本体技术的发展,本体越来越多地被应用于异构信息的整合。本文提出了一种基于本体的医疗信息整合框架,给出了框架的结构和本体整合的过程,并通过实例验证了本体整合的有效性。相似文献

20.

煤矿复杂多源数据的混合相似性度量方法

周勇夏士雄李文超张磊牛强《江南大学学报(自然科学版)》2007,6(6):665-668

随着煤矿产业自动化、电子化和信息化的发展,目前已有的数据描述方式不能有效刻画煤矿中多源、多维、动态的海量数据信息.针对该问题,首先将源于哲学范畴中的本体引入煤矿生产领域,以矿区监测位置本体作为语义模型与数值描述相结合,形成树状层次结构来描述煤矿复杂多源数据;然后,通过基于语义和数值的混合相似性度量方法计算煤矿数据的相似度;最后通过实验验证了本文所述方法的可行性和有效性. 相似文献