首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于树模型的差分隐私保护算法
引用本文:邓蔚,陈秀婷,张清华,王国胤.基于树模型的差分隐私保护算法[J].重庆邮电大学学报(自然科学版),2020,32(5):848-849.
作者姓名:邓蔚  陈秀婷  张清华  王国胤
作者单位:重庆邮电大学 计算智能重庆市重点实验室,重庆 400065; 西南财经大学 统计学院,成都 611130
基金项目:国家重点研发计划(2016YFB1000905)
摘    要:目前面向分类的差分隐私保护算法中,大部分都是基于决策树或者随机森林等树模型。若数据集中同时存在连续数据和离散数据时,算法往往会选择调用2次指数机制,并且进行隐私预算分配时往往选择平均分配。这都使得隐私预算过小、噪声过大、时间成本增加以及分类准确性降低。如何在保证数据隐私的同时尽可能地保证数据可用性,并提高算法性能,成为目前差分隐私保护技术研究的重点。提出了面向决策树和随机森林的差分隐私保护数据挖掘算法,使用Laplace机制来处理离散型特征,使用指数机制处理连续型特征,选择最佳分裂特征和分裂点,并采用最优特征选择策略和等差预算分配加噪策略。对金融数据集的测试结果表明,提出的2种基于树模型的差分隐私保护算法都能在保护数据隐私的同时,具有较高的分类准确性,并且能够充分利用隐私保护预算,节省了时间成本。

关 键 词:隐私保护  差分隐私  等差预算分配  决策树  随机森林
收稿时间:2020/6/28 0:00:00
修稿时间:2020/9/18 0:00:00

Differential privacy protection algorithms based on tree models
DENG Wei,CHEN Xiuting,ZHANG Qinghu,WANG Guoyin.Differential privacy protection algorithms based on tree models[J].Journal of Chongqing University of Posts and Telecommunications,2020,32(5):848-849.
Authors:DENG Wei  CHEN Xiuting  ZHANG Qinghu  WANG Guoyin
Institution:Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China; School of Statistics, Southwestern University of Finance and Economics, Chengdu 611130, P. R. China;Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China
Abstract:Most of the existing differential privacy protection algorithms for classification are based on tree model such as decision trees or random forests. If there are both continuous data and discrete data in the data set, the algorithm will often choose to call the exponential mechanism twice and the average allocation is often selected when the privacy budget is allocated. This makes the privacy budget too small and generates excessive noise. And it also increases time cost and decreases classification accuracy. It becomes the focus of current research to ensure data availability and improve the performance while ensuring data privacy. In this paper, two differential privacy protection data mining algorithms based on decision trees and random forests are proposed. First, the Laplace mechanism is used to deal with discrete features. And the exponential mechanism is used to deal with continuous features. Then, the best split features and split points are selected. Using the equal budget allocation, the noise is added to improve the performance of the algorithms while making full use of the privacy protection budget. The experimental results on the financial datasets show that the two tree-based algorithms can protect data privacy with high classification accuracy. While making full use of the privacy protection budget, the algorithms saves time cost.
Keywords:privacy protection  differential privacy  equal budget allocation  decision tree  random forest
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号