Data privacy is an important issue for "machine learning as a service" providers. We focus on the problem of membership inference attacks: given a data sample and black-box access to a model's API, determine whether the sample existed in the model's training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.
翻译:数据隐私是“机器学习作为服务”提供者的一个重要问题。我们关注会员推断攻击问题:给一个数据样本和黑盒访问模型的API,确定样本是否存在于模型的培训数据中。我们的贡献是在序列到序列模型的背景下调查这一问题,这些模型在机器翻译和视频字幕等应用中非常重要。我们界定了序列生成的会员推断问题,提供了基于最新机器翻译模型的开放数据集,并报告了这些模型是否泄露了针对多种会员推断攻击的私人信息的初步结果。