Recently, a number of backdoor attacks against Federated Learning (FL) have been proposed. In such attacks, an adversary injects poisoned model updates into the federated model aggregation process with the goal of manipulating the aggregated model to provide false predictions on specific adversary-chosen inputs. A number of defenses have been proposed; but none of them can effectively protect the FL process also against so-called multi-backdoor attacks in which multiple different backdoors are injected by the adversary simultaneously without severely impacting the benign performance of the aggregated model. To overcome this challenge, we introduce FLGUARD, a poisoning defense framework that is able to defend FL against state-of-the-art backdoor attacks while simultaneously maintaining the benign performance of the aggregated model. Moreover, FL is also vulnerable to inference attacks, in which a malicious aggregator can infer information about clients' training data from their model updates. To thwart such attacks, we augment FLGUARD with state-of-the-art secure computation techniques that securely evaluate the FLGUARD algorithm. We provide formal argumentation for the effectiveness of our FLGUARD and extensively evaluate it against known backdoor attacks on several datasets and applications (including image classification, word prediction, and IoT intrusion detection), demonstrating that FLGUARD can entirely remove backdoors with a negligible effect on accuracy. We also show that private FLGUARD achieves practical runtimes.

0
下载
关闭预览

相关内容

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

0
0
下载
预览

We propose a novel method for protecting trained models with a secret key so that unauthorized users without the correct key cannot get the correct inference. By taking advantage of transfer learning, the proposed method enables us to train a large protected model like a model trained with ImageNet by using a small subset of a training dataset. It utilizes a learnable encryption step with a secret key to generate learnable transformed images. Models with pre-trained weights are fine-tuned by using such transformed images. In experiments with the ImageNet dataset, it is shown that the performance of a protected model was close to that of a non-protected model when the correct key was given, while the accuracy tremendously dropped when an incorrect key was used. The protected model was also demonstrated to be robust against key estimation attacks.

0
0
下载
预览

Hierarchical models for text classification can leak sensitive or confidential training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models by perturbing the training optimizer. However, for hierarchical text classification a multiplicity of model architectures is available and it is unclear whether some architectures yield a better trade-off between remaining model accuracy and model leakage under differentially private training perturbation than others. We use a white-box membership inference attack to assess the information leakage of three widely used neural network architectures for hierarchical text classification under differential privacy. We show that relatively weak differential privacy guarantees already suffice to completely mitigate the membership inference attack, thus resulting only in a moderate decrease in utility. More specifically, for large datasets with long texts we observed transformer-based models to achieve an overall favorable privacy-utility trade-off, while for smaller datasets with shorter texts CNNs are preferable.

0
0
下载
预览

Nowadays, Deep Neural Networks are widely applied to various domains. However, massive data collection required for deep neural network reveals the potential privacy issues and also consumes large mounts of communication bandwidth. To address these problems, we propose a privacy-preserving method for the federated learning distributed system, operated on Intel Software Guard Extensions, a set of instructions that increase the security of application code and data. Meanwhile, the encrypted models make the transmission overhead larger. Hence, we reduce the commutation cost by sparsification and it can achieve reasonable accuracy with different model architectures.

0
0
下载
预览

We propose AriaNN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data. Our semi-honest 2-party computation protocol leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural networks such as ReLU, MaxPool and BatchNorm. For instance, we perform private comparison for ReLU operations with a single message of the size of the input during the online phase, and with preprocessing keys close to 4X smaller than previous work. Last, we propose an extension to support n-party private federated learning. We implement our framework as an extensible system on top of PyTorch that leverages CPU and GPU hardware acceleration for cryptographic and machine learning operations. We evaluate our end-to-end system for private inference and training on standard neural networks such as AlexNet, VGG16 or ResNet18 between distant servers. We show that computation rather than communication is the main bottleneck and that using GPUs together with reduced key size is a promising solution to overcome this barrier.

0
0
下载
预览

A recent take towards Federated Analytics (FA), which allows analytical insights of distributed datasets, reuses the Federated Learning (FL) infrastructure to evaluate the summary of model performances across the training devices. However, the current realization of FL adopts single server-multiple client architecture with limited scope for FA, which often results in learning models with poor generalization, i.e., an ability to handle new/unseen data, for real-world applications. Moreover, a hierarchical FL structure with distributed computing platforms demonstrates incoherent model performances at different aggregation levels. Therefore, we need to design a robust learning mechanism than the FL that (i) unleashes a viable infrastructure for FA and (ii) trains learning models with better generalization capability. In this work, we adopt the novel democratized learning (Dem-AI) principles and designs to meet these objectives. Firstly, we show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn, as a practical framework to empower generalization capability in support of FA. Secondly, we validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions by leveraging the distributed computing infrastructure. The distributed edge computing servers construct regional models, minimize the communication loads, and ensure distributed data analytic application's scalability. To that end, we adhere to a near-optimal two-sided many-to-one matching approach to handle the combinatorial constraints in Edge-DemLearn and solve it for fast knowledge acquisition with optimization of resource allocation and associations between multiple servers and devices. Extensive simulation results on real datasets demonstrate the effectiveness of the proposed methods.

0
0
下载
预览

As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models is facing efficiency and privacy challenges. Recently, federated learning (FL) has emerged as an alternative solution and continue to thrive in this new reality. Existing FL protocol design has been shown to be vulnerable to adversaries within or outside of the system, compromising data privacy and system robustness. Besides training powerful global models, it is of paramount importance to design FL systems that have privacy guarantees and are resistant to different types of adversaries. In this paper, we conduct the first comprehensive survey on this topic. Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks and defenses. Finally, we discuss promising future research directions towards robust and privacy-preserving federated learning.

0
12
下载
预览

Train machine learning models on sensitive user data has raised increasing privacy concerns in many areas. Federated learning is a popular approach for privacy protection that collects the local gradient information instead of real data. One way to achieve a strict privacy guarantee is to apply local differential privacy into federated learning. However, previous works do not give a practical solution due to three issues. First, the noisy data is close to its original value with high probability, increasing the risk of information exposure. Second, a large variance is introduced to the estimated average, causing poor accuracy. Last, the privacy budget explodes due to the high dimensionality of weights in deep learning models. In this paper, we proposed a novel design of local differential privacy mechanism for federated learning to address the abovementioned issues. It is capable of making the data more distinct from its original value and introducing lower variance. Moreover, the proposed mechanism bypasses the curse of dimensionality by splitting and shuffling model updates. A series of empirical evaluations on three commonly used datasets, MNIST, Fashion-MNIST and CIFAR-10, demonstrate that our solution can not only achieve superior deep learning performance but also provide a strong privacy guarantee at the same time.

0
3
下载
预览

We present one-shot federated learning, where a central server learns a global model over a network of federated devices in a single round of communication. Our approach - drawing on ensemble learning and knowledge aggregation - achieves an average relative gain of 51.5% in AUC over local baselines and comes within 90.1% of the (unattainable) global ideal. We discuss these methods and identify several promising directions of future work.

0
6
下载
预览

Machine Learning is a widely-used method for prediction generation. These predictions are more accurate when the model is trained on a larger dataset. On the other hand, the data is usually divided amongst different entities. For privacy reasons, the training can be done locally and then the model can be safely aggregated amongst the participants. However, if there are only two participants in \textit{Collaborative Learning}, the safe aggregation loses its power since the output of the training already contains much information about the participants. To resolve this issue, they must employ privacy-preserving mechanisms, which inevitably affect the accuracy of the model. In this paper, we model the training process as a two-player game where each player aims to achieve a higher accuracy while preserving its privacy. We introduce the notion of \textit{Price of Privacy}, a novel approach to measure the effect of privacy protection on the accuracy of the model. We develop a theoretical model for different player types, and we either find or prove the existence of a Nash Equilibrium with some assumptions. Moreover, we confirm these assumptions via a Recommendation Systems use case: for a specific learning algorithm, we apply three privacy-preserving mechanisms on two real-world datasets. Finally, as a complementary work for the designed game, we interpolate the relationship between privacy and accuracy for this use case and present three other methods to approximate it in a real-world scenario.

0
4
下载
预览
小贴士
相关论文
Runhua Xu,Nathalie Baracaldo,Yi Zhou,Ali Anwar,James Joshi,Heiko Ludwig
0+阅读 · 3月5日
MaungMaung AprilPyone,Hitoshi Kiya
0+阅读 · 3月5日
Dominik Wunderlich,Daniel Bernau,Francesco Aldà,Javier Parra-Arnau,Thorsten Strufe
0+阅读 · 3月4日
Sheng Lin,Chenghong Wang,Hongjia Li,Jieren Deng,Yanzhi Wang,Caiwen Ding
0+阅读 · 3月3日
Théo Ryffel,Pierre Tholoniat,David Pointcheval,Francis Bach
0+阅读 · 3月3日
Shashi Raj Pandey,Minh N. H. Nguyen,Tri Nguyen Dang,Nguyen H. Tran,Kyi Thar,Zhu Han,Choong Seon Hong
0+阅读 · 3月3日
Lingjuan Lyu,Han Yu,Xingjun Ma,Lichao Sun,Jun Zhao,Qiang Yang,Philip S. Yu
12+阅读 · 2020年12月7日
Lichao Sun,Jianwei Qian,Xun Chen,Philip S. Yu
3+阅读 · 2020年7月31日
One-Shot Federated Learning
Neel Guha,Ameet Talwalkar,Virginia Smith
6+阅读 · 2019年3月5日
Balazs Pejo,Qiang Tang
4+阅读 · 2018年2月28日
相关VIP内容
专知会员服务
35+阅读 · 2020年12月2日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
21+阅读 · 2019年10月17日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
37+阅读 · 2019年10月9日
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
9+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
6+阅读 · 2019年5月18日
Call for Participation: Shared Tasks in NLPCC 2019
中国计算机学会
5+阅读 · 2019年3月22日
逆强化学习-学习人先验的动机
CreateAMind
4+阅读 · 2019年1月18日
Unsupervised Learning via Meta-Learning
CreateAMind
26+阅读 · 2019年1月3日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
【学习】Hierarchical Softmax
机器学习研究会
3+阅读 · 2017年8月6日
Auto-Encoding GAN
CreateAMind
5+阅读 · 2017年8月4日
Top