Balancing privacy and accuracy is a major challenge in designing differentially private machine learning algorithms. One way to improve this tradeoff for free is to leverage the noise in common data operations that already use randomness. Such operations include noisy SGD and data subsampling. The additional noise in these operations may amplify the privacy guarantee of the overall algorithm, a phenomenon known as privacy amplification. In this paper, we analyze the privacy amplification of sampling from a multidimensional Bernoulli distribution family given the parameter from a private algorithm. This setup has applications to Bayesian inference and to data compression. We provide an algorithm to compute the amplification factor, and we establish upper and lower bounds on this factor.
翻译:平衡隐私和准确性是设计差别化的私人机器学习算法的一大挑战。 改善这种免费权衡的方法之一是在通用数据操作中利用已经使用随机性的噪音。 此类操作包括吵闹的 SGD 和数据子抽样。 这些操作中的额外噪音可能扩大总体算法的隐私保障, 即称为隐私放大的现象 。 在本文中, 我们分析一个多层面的 Bernoulli 分布式家庭的隐私放大, 其参数来自私人算法 。 这个设置可以应用 Bayesian 推论和数据压缩 。 我们提供一种算法来计算放大系数, 我们在这个系数上下划下界限 。