首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In this paper, we study the skyline group problem over a data stream. An object can dominate another object if it is not worse than the other object on all attributes and is better than the other object on at least one attribute. If an object cannot be dominated by any other object, it is a skyline object. The skyline group problem involves finding k-item groups that cannot be dominated by any other k-item group. Existing algorithms designed to find skyline groups can only process static data. However, data changes as a stream with time in many applications,and algorithms should be designed to support skyline group queries on dynamic data. In this paper, we propose new algorithms to find skyline groups over a data stream. We use data structures, namely a hash table, dominance graph, and matrix, to store dominance information and update results incrementally. We conduct experiments on synthetic datasets to evaluate the performance of the proposed algorithms. The experimental results show that our algorithms can efficiently find skyline groups over a data stream.  相似文献   

2.
针对现有实体对齐方法大多以本体模式匹配为基础,处理异构关联数据集间对齐关系存在局限性且实体链接缺失问题严重的现状,在分析关联数据语义的基础上,提出了一种独立于模式的基于属性语义特征的实体对齐方法,对关联数据集中实体属性根据语义标签特征及统计特征建模,并采用有监督的可变样本集VS-Adaboost算法实现分类器优化。实验结果表明,该方法的时间效率、准确率、查全率较高,F测度效果较好。  相似文献   

3.
实体关系抽取是知识图谱技术的重要环节之一。英文实体关系抽取的研究已经比较成熟,相比之下,中文实体关系抽取的发展却并不理想。由于相关语料的匮乏,中文实体关系抽取的发展受到了一定的限制。针对这一问题,COAE2016在任务三中提出了中文实体关系抽取任务。通过分别使用了基于模板、基于SVM与基于CNN的实体关系抽取算法解决了这一问题,并根据其在COAE2016任务三的评测数据集上的效果,对比分析了三种实体关系抽取算法的优缺点。实验证明,基于SVM的算法和基于CNN的算法均在评测数据集上表现出了良好的效果。  相似文献   

4.
与现有的根据知识图谱的结构信息或实体属性特征进行相似度匹配的实体对齐的方法不同,提出了一种基于表示学习的知识图谱实体对齐方法.首先,在低维向量空间下,通过机器学习方法学得实体和关系的语义表示,这种表示形式蕴含了知识图谱的内在结构信息及实体属性特征;其次,将人工标注的实体对作为先验知识,学习知识图谱间实体对的映射关系.经实验验证表明:与基于特征匹配的方法SiGMa相比,本文方法能够有效提高知识图谱实体对齐的精确率,同时保持较高的F1值.  相似文献   

5.
知识图谱的表示学习方法将实体和关系嵌入到低维连续空间中,从而挖掘出实体间的隐含联系.传统的表示学习方法多基于知识图谱的结构化信息,没有充分利用实体的描述文本信息.目前基于文本的表示学习方法多将文本向量化,忽略了文本中实体间的语义关联.针对上述缺点提出一种利用实体描述文本进行增强学习的方法,基于文本挖掘出关联性实体并对关联性进行分级,将关联性作为辅助约束融合到知识图谱的表示学习中.实验结果表明,该辅助约束能明显提升推理效果,优于传统的结构化学习模型以及基于深度学习的文本和结构的联合表示模型.   相似文献   

6.
实体链接任务的目的是将文本中的实体指称链接到知识库中与之对应的无歧义实体。针对此任务, 提出一种基于主题敏感的重启随机游走的实体链接方法。该方法首先使用实体指称的背景文本信息将实体指称扩充为全称, 并在维基百科知识库中搜索候选实体, 得到候选实体集合; 根据上述中间结果构建图, 利用在图上的主题敏感重启随机游走得到的平稳分布对候选实体集合进行排序, 选出top 1 的候选实体作为目标实体。实验结果表明, 该方法在KBP2014 实体链接数据集上实验的F 值为0.623, 高于其他系统实验的F值, 能够有效提高实体链接系统的整体性能。  相似文献   

7.
为帮助学习者从大量在线学资源中找到适合自身个性化的学习资源及顺序集合,提出一种基于有向边方向权值的标签传播算法(LPADEW)用于发现适合特定学习者并属于同一学习周期的微学习单元序列簇群。该算法对标签传播算法进行两个改进:根据单元节点的利用度确定标签的更新顺序,降低在节点更新顺序上的随机性;利用当前单元节点的前置邻居和后置邻居的有向边权累加值进行标签更新,并将标签权重引入标签更新策略,既可降低标签更新的随机性,也可避免形成巨型簇群。实验结果表明,LPADEW算法在微学习真实数据集和人工数据集中均取得了较好的结果。  相似文献   

8.
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we propose a DNA sequence alignment that uses quality information and a fuzzy inference method developed based on characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores are calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch, which is established by using quality information of each DNA fragment. However, there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low, because only overall DNA sequence quality information are used. In our proposed method, an exact DNA sequence alignment can be achieved in spite of low quality of DNA fragment tips by improvement of conventional algorithms using quality information. Mapping score parameters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of National Center for Biotechnology Information, we could see that the proposed method is more efficient than conventional algorithms.  相似文献   

9.
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we propose a DNA sequence alignment that uses quality information and a fuzzy inference method developed based on the characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores are calculated by the global sequence alignment algo- rithm proposed by Needleman-Wunsch, which is established by using quality information of each DNA fragment. However, there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low, because only the overall DNA sequence quality information are used. In our proposed method, an exact DNA sequence alignment can be achieved in spite of the low quality of DNA fragment tips by improvement of conventional algorithms using quality information. Mapping score param- eters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of National Center for Biotechnology Information, we could see that the proposed method is more efficient than conventional algorithms.  相似文献   

10.
重叠社区发现是复杂网络分析研究的重要目标之一。针对传统多标签传播算法存在的社区发现结果具有随机性、不稳定性,以及忽视节点影响力对标签传播的影响等问题,提出一种基于节点影响力与多标签传播的能够生成稳定社区的重叠社区发现算法。算法在节点影响力的计算、排序和核心节点识别基础上,通过邻居节点初始标签的再处理和基于平衡系数的节点标签异步更新策略,实现复杂网络重叠社区的有效识别。在真实数据集和人工数据集上的实验综合表明,算法性能优于各对比算法,适用于大规模复杂网络。  相似文献   

11.
Instance-specific algorithm selection technologies have been successfully used in many research fields,such as constraint satisfaction and planning. Researchers have been increasingly trying to model the potential relations between different candidate algorithms for the algorithm selection. In this study, we propose an instancespecific algorithm selection method based on multi-output learning, which can manage these relations more directly.Three kinds of multi-output learning methods are used to predict the performances of the candidate algorithms:(1)multi-output regressor stacking;(2) multi-output extremely randomized trees; and(3) hybrid single-output and multioutput trees. The experimental results obtained using 11 SAT datasets and 5 Max SAT datasets indicate that our proposed methods can obtain a better performance over the state-of-the-art algorithm selection methods.  相似文献   

12.
Much data such as geometric image data and drawings have graph structures.Such data are called graph structured data. In order to manage efficiently such graph structured data, we need to analyze and abstract graph structures of such data. The purpose of this paper is to find knowledge representations which indicate plural abstractions of graph structured data. Firstly, we introduce a term graph as a graph pattern having structural variables, and a substitution over term graphs which is graph we also define a multiple layer for S as a pair (D,O) of a set D of term graphs and a list of substitutions. Secondly, for a graph G and a set S of graphs, we present effective algorithms for extracting minimal multiple layers of G and S which give us stratifying abstractions of G and S, respectively. Finally, we report experimental results obtained by applying our algorithms to both artificial data and drawings of power plants which are real world data.  相似文献   

13.
实体解析是指识别同一实体的不同描述形式的过程, 旨在保障数据质量, 是数据清理、数据集成及数据挖掘中的关键技术. 随着电子商务的不断发展和成熟, 商品的多样性和消费者灵活的购买方式, 使得对网络商品的精确识别和匹配成为大数据时代亟待解决的问题. 与传统实体解析主要针对结构化数据不同, 网络数据具有非结构化、异构和海量的特性, 为此设计了综合相似度算法(synthesized similarity method, SSM)来计算网络商品数据间的相似度, 同时引入凝聚的层次聚类框架, 以匹配来自不同数据源的异构商品. 此外, 为了解决大数据环境下对执行效率的要求, 从字符串相似度缓存、约束知识库和分块策略三个方面对SSM进行优化, 基于真实数据集的实验结果验证了SSM的执行效率和有效性.  相似文献   

14.
In this paper we propose four-dimensional (4D) operators, which can be used to deal with sequential changes of topological relationships between 4D moving objects and we call them 4D development operators. In contrast to the existing operators, we can apply the operators to real applications on 4D moving objects. We also propose a new approach to define them. The approach is based on a dimension-separated method, which considers x-y coordinates and z coordinates separately. In order to show the applicability of our operators, we show the algorithms for the proposed operators and development graph between 4D moving objects.  相似文献   

15.
In this paper we propose four-dimensional (4D) operators, which can be used to deal with sequential changes of topological relationships between 4D moving objects and we call them 4D development operators. In contrast to the existing operators, we can apply the operators to real applications on 4D moving objects. We also propose a new approach to define them. The approach is based on a dimension-separated method, which considers x-y coordinates and z coordinates separately. In order to show the applicability of our operators, we show the algorithms for the proposed operators and development graph between 4D moving objects.  相似文献   

16.
数据集的质量会极大地影响分类算法的精度,针对一类隐式互斥的数值型数据提出了一致性分类方法.借鉴连续函数的思想,提出了数值型连续数据的分类一致性定义;改进了SOM算法的计算过程,使其满足文中提出的分类一致性最优条件.通过改进的SOM方法得到一个新的聚类数据集,减少了原始数据集中容易出现的隐式分类不一致性问题,从而有效地提高了分类方法的效率和分类精度.通过在一个实际的数据集上的比较,表明提出的算法的预测精度明显优于其他算法.进而还从VC维的角度分析了提出算法的优点.  相似文献   

17.
提出了差异共表达框架和一个差异共表达评分函数,以观察到的一个双聚类基因在所属双聚类的条件下共表达和在其他条件下非共表达为基础,客观量化基因双聚类的质量.此外,还提出了一个评分函数把双聚类分层为三种类型的共表达.在实现双聚类输出统一排名中,使用提出的评分函数对这4个公认的双聚类算法在不同区域的6个实际数据集上的性能和行为进行测试.实验结果表明,在鉴别共表达双聚类方面,差异共表达框架能有效提高共表达基因双聚类质量和双聚类算法的性能.  相似文献   

18.
对给定数据集合的元素重要性进行估计是数据挖掘领域中的一项重要应用。现有的技术都是通过排序或选择来发现重要元素,其主要缺点是没考虑高排名对象可能非常相似甚至完全相同这一事实,忽略了高排名对象间的冗余性。因此,在强调多样性的场合,该方法性能有限。本文通过将排序和选择相结合,提出一种基于集合覆盖的元素重要性估计算法。该算法不仅考察单个集合覆盖的解,而且计算元素参与的高质量集合覆盖数量,进而为元素分配重要性分值。基于实际数据的实验和用户学习结果表明,本文算法性能高效,元素重要性评估结果的有用性高,且与人类感知相一致。  相似文献   

19.
城市地理信息系统中数据更新探讨   总被引:4,自引:0,他引:4  
针对城市地理信息系统数据更新中存在的技术单一落后、缺乏有效的数据更新机制等问题,从城市地理信息更新的需要出发,采用对比分析与系统设计的方法,阐述了城市地理信息系统数据更新的数据源、更新模式与数据安全性等问题,给出了不同情况下的更新策略。将对相关单位的数据生产与管理有借鉴意义,以利信息的现势性维护与可持续利用。  相似文献   

20.
针对自然语言处理(Natural Language Processing, NLP)任务中,命名实体识别(Named Entity Recognition, NER)存在实体类别样本不平衡的问题,提出一种基于改进损失函数的实体类别平衡优化算法。新算法是对神经网络模型中的损失函数进行优化处理,通过分析命名实体识别数据特点,在平衡正负样本的基础上引入平滑系数和权重系数,保证模型在梯度传递的过程更关注于实体类别较少和带有嵌套的难识别样本,同时减少对样本数较多的、易识别样本的关注。利用公共数据集ACE05、MSRA进行实验对比,结果表明改进的损失函数在数据集ACE05和MSRA上,F1值分别提高1.53%和0.91%。上述结果表明改进的损失函数能够较好地缓解实体中正负难易样本的不平衡。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号