DP-XGBoost:规模化私人机器学习 (DP-XGBoost: Private Machine Learning at Scale) - 专知论文

会员服务 ·

0

缩放 · Machine Learning · 学成 · Dask · Spark ·

2021 年 10 月 25 日

DP-XGBoost: Private Machine Learning at Scale

翻译：DP-XGBoost:规模化私人机器学习

Nicolas Grislain,Joan Gonzalvez

The big-data revolution announced ten years ago does not seem to have fully happened at the expected scale. One of the main obstacle to this, has been the lack of data circulation. And one of the many reasons people and organizations did not share as much as expected is the privacy risk associated with data sharing operations. There has been many works on practical systems to compute statistical queries with Differential Privacy (DP). There have also been practical implementations of systems to train Neural Networks with DP, but relatively little efforts have been dedicated to designing scalable classical Machine Learning (ML) models providing DP guarantees. In this work we describe and implement a DP fork of a battle tested ML model: XGBoost. Our approach beats by a large margin previous attempts at the task, in terms of accuracy achieved for a given privacy budget. It is also the only DP implementation of boosted trees that scales to big data and can run in distributed environments such as: Kubernetes, Dask or Apache Spark.

翻译：十年前宣布的大数据革命似乎没有在预期的规模上完全发生。造成这一变化的主要障碍之一是缺乏数据流通。众多原因之一是与数据共享操作有关的隐私风险没有达到预期的程度。许多关于用差异隐私(DP)计算统计查询的实用系统的工作。还实际实施了与DP一起培训神经网络的系统,但是在设计可缩放的经典机器学习模型以提供DP保障方面所做的努力相对较少。在这项工作中,我们描述并实施了经过战斗测试的ML模型的DP叉: XGBoost。我们的方法比先前在特定隐私预算方面实现的准确性大受任务尝试的打击。这也是在大型数据、 Dask 或 Apache Spark 等分布式环境中, 唯一能实施强化树的DP 。

0

相关内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

38+阅读 · 2020年11月3日

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

27+阅读 · 2020年7月12日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

专知会员服务

127+阅读 · 2020年3月7日

【新书】用Python3六步掌握机器学习第二版，469页pdf，Mastering Machine Learning

【新书】用Python3六步掌握机器学习第二版，469页pdf，Mastering Machine Learning

专知会员服务

214+阅读 · 2020年2月2日

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

专知会员服务

25+阅读 · 2020年1月23日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

112+阅读 · 2019年12月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

38+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

已删除

将门创投

3+阅读 · 2019年4月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

8+阅读 · 2018年12月28日

Machine Learning：十大机器学习算法

Machine Learning：十大机器学习算法

开源中国

19+阅读 · 2018年3月1日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

Distributed Machine Learning and the Semblance of Trust

Arxiv

0+阅读 · 2021年12月21日

Robust and Privacy-Preserving Collaborative Learning: A Comprehensive Survey

Arxiv

3+阅读 · 2021年12月19日

Large Scale Private Learning via Low-rank Reparametrization

Arxiv

5+阅读 · 2021年6月17日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Privacy-Preserving News Recommendation Model Learning

Privacy-Preserving News Recommendation Model Learning

Arxiv

6+阅读 · 2020年10月8日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

A Survey on Distributed Machine Learning

Arxiv

43+阅读 · 2019年12月20日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Arxiv

3+阅读 · 2019年12月17日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Dynamic Control Flow in Large-Scale Machine Learning

Arxiv

3+阅读 · 2018年5月4日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

38+阅读 · 2020年11月3日

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

27+阅读 · 2020年7月12日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

【Manning2020新书】R/mlr机器学习，513页pdf，Machine Learning with R

专知会员服务

127+阅读 · 2020年3月7日

【新书】用Python3六步掌握机器学习第二版，469页pdf，Mastering Machine Learning

【新书】用Python3六步掌握机器学习第二版，469页pdf，Mastering Machine Learning

专知会员服务

214+阅读 · 2020年2月2日

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

【哥伦比亚大学应用机器学习课程2020】《COMS W4995 Applied Machine Learning Spring 2020》

专知会员服务

25+阅读 · 2020年1月23日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

112+阅读 · 2019年12月24日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

38+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

热门VIP内容

相关资讯

移动端机器学习资源合集

移动端机器学习资源合集

专知

8+阅读 · 2019年4月21日

已删除

将门创投

3+阅读 · 2019年4月12日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

8+阅读 · 2018年12月28日

Machine Learning：十大机器学习算法

Machine Learning：十大机器学习算法

开源中国

19+阅读 · 2018年3月1日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

相关论文

Distributed Machine Learning and the Semblance of Trust

Arxiv

0+阅读 · 2021年12月21日

Robust and Privacy-Preserving Collaborative Learning: A Comprehensive Survey

Arxiv

3+阅读 · 2021年12月19日

Large Scale Private Learning via Low-rank Reparametrization

Arxiv

5+阅读 · 2021年6月17日

Privacy and Robustness in Federated Learning: Attacks and Defenses

Arxiv

35+阅读 · 2020年12月7日

Privacy-Preserving News Recommendation Model Learning

Privacy-Preserving News Recommendation Model Learning

Arxiv

6+阅读 · 2020年10月8日

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

Arxiv

5+阅读 · 2020年7月31日

A Survey on Distributed Machine Learning

Arxiv

43+阅读 · 2019年12月20日

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Asynchronous Federated Learning with Differential Privacy for Edge Intelligence

Arxiv

3+阅读 · 2019年12月17日

DP-ADMM: ADMM-based Distributed Learning with Differential Privacy

Arxiv

3+阅读 · 2019年3月25日

Dynamic Control Flow in Large-Scale Machine Learning

Arxiv

3+阅读 · 2018年5月4日

微信扫码咨询专知VIP会员