首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于聚类分析的个性化异构数据发布
引用本文:聂静,常涛,刘维,吕小红,王晨,杨知方.基于聚类分析的个性化异构数据发布[J].科学技术与工程,2021,21(14):5813-5821.
作者姓名:聂静  常涛  刘维  吕小红  王晨  杨知方
作者单位:国网重庆市电力公司,重庆 400014;输配电装备及系统安全与新技术国家重点实验室(重庆大学),重庆400014
基金项目:国家自然科学基金青年科学基金项目(51807014)
摘    要:由于异构数据的发布缺乏灵活性与实用性,提出了一种基于聚类分析的个性化异构数据发布方法.首先综合考虑数据的各种属性,通过聚类标签对数据的集群结构进行编码.另外,通过不断迭代更新原始数据能够始终保留数据的聚类结构,进一步在原始数据中加入噪声从而满足-差分隐私的要求.在满足差分隐私原则的前提下,提出了一种同时处理关系数据和集值数据的不确定性算法,不同类型的数据以类似的方式进行匿名化.通过实验验证了该方法能够有效提升异构数据发布的泛化能力,提升安全性与实用性.

关 键 词:隐私  异构数据  个性化  泛化性
收稿时间:2020/7/29 0:00:00
修稿时间:2021/6/28 0:00:00

Personalized Heterogeneous Data Publishing Based on Clustering Analysis
Nie Jing,Chang Tao,Liu Wei,Lv Xiaohong,Wang Chen,Yang Zhifang.Personalized Heterogeneous Data Publishing Based on Clustering Analysis[J].Science Technology and Engineering,2021,21(14):5813-5821.
Authors:Nie Jing  Chang Tao  Liu Wei  Lv Xiaohong  Wang Chen  Yang Zhifang
Institution:StateSGridSChongqingSElectricSPowerSCompany; State Key Laboratory of Transmission and Distribution Equipment and System Safety and New Technology Chongqing University
Abstract:Aiming at the privacy protection of heterogeneous data publishing and the generalization of data mining, a differential privacy publishing scheme of heterogeneous data for clustering analysis was proposed. In order to solve the problem of lack of correct guidance after dealing with privacy information, the original data was grouped into clusters, and the cluster structure of data was coded by using the cluster label. In addition, a distance measurement cluster considering both relationship attribute and set-valued attribute was customized for heterogeneous data. Then, the original data was summarized iteratively while retaining the cluster structure. Furthermore, noise was added to the original data to meet the requirement of - differential privacy. On the premise of satisfying the principle of differential privacy, a uncertainty algorithm was proposed to process relational data and set-valued data simultaneously. Different types of data were anonymized in a similar way. Experiments show that this method can effectively solve the problem of heterogeneous data publishing.
Keywords:data  publishing    heterogeneous  data    differential  privacy    cluster  analysis
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号