首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Markov逻辑网在重复数据删除中的应用
引用本文:张玉芳,黄涛,艾东梅,熊忠阳,唐蓉君.Markov逻辑网在重复数据删除中的应用[J].重庆大学学报(自然科学版),2010,33(8):36-41.
作者姓名:张玉芳  黄涛  艾东梅  熊忠阳  唐蓉君
作者单位:重庆大学,计算机学院,重庆,400044;重庆大学,网络中心,重庆,400044
基金项目:重庆市自然科学基金资助项目,中国博士后科学基金资助项目 
摘    要:为了解决和突破现阶段重复数据删除方法大多只能针对特定领域,孤立地解决问题的某个方面所带来的不足和局限,提出了基于Markov逻辑网的统计关系学习方法。该方法可以通过计算一个世界的概率分布来为推理服务,从而可将重复数据删除问题形式化。具体采用了判别式训练的学习算法和MC-SAT推理算法,并详细阐述了如何用少量的谓词公式来描述重复数据删除问题中不同方面的本质特征,将Markov逻辑表示的各方面组合起来形成各种模型。实验结果表明基于Markov逻辑网的重复数据删除方法不但可以涵盖经典的Fellegi-Sunter模型,还可以取得比传统的基于聚类算法和基于相似度计算的方法更好的效果,从而为Markov逻辑网解决实际问题提供了有效途径。

关 键 词:重复数据删除  Markov逻辑网  Markov网  统计关系学习  机器学习
收稿时间:1/2/2010 12:00:00 AM

Markov Logic Networks with its application in De duplication
ZHNG Yu fang,HUANG Tao,AI Dong mei,XIONG Zhong yang and TANG Rong jun.Markov Logic Networks with its application in De duplication[J].Journal of Chongqing University(Natural Science Edition),2010,33(8):36-41.
Authors:ZHNG Yu fang  HUANG Tao  AI Dong mei  XIONG Zhong yang and TANG Rong jun
Institution:College of Computer Science,Chongqing University,Chongqing 400044,P.R. China;College of Computer Science,Chongqing University,Chongqing 400044,P.R. China;College of Computer Science,Chongqing University,Chongqing 400044,P.R. China;College of Computer Science,Chongqing University,Chongqing 400044,P.R. China;Center of Information and Network,Chongqing University,Chongqing 400044,P.R. China
Abstract:In order to solve the limitation that the traditional De duplications are mostly used for a specific field and only address one aspect of a problem,a scheme based on Markov Logic Networks (MLNs)is proposed, which is a new Statistical Relational Learning (SRL) model. With its advantage of computing the probability distribution of worlds to serve for the inference, the De duplication is formalized. Discriminative learning algorithm is adopted for Markov Logic Networks weights, MC SAT algorithm is adopted for inference. It shows how to capture the essential features of different aspects in De duplication with a small number of predicate rules and also combines these rules together to compose all kinds of model. The experiment results prove that the method based on Markov Logic Networks not only covers the original Fellegi Sunter model, but also achieves a better result than the traditional methods based on Clustering Algorithms and Similarity Measures in De duplication. It reveals that the Markov Logic Networks can play an important part in practical application.
Keywords:de duplication  markov logic networks  markov networks  statistical relational learning  machine learning
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《重庆大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号