In this paper, we consider batch supervised learning where an adversary is allowed to corrupt instances with arbitrarily large noise. The adversary is allowed to corrupt any $l$ features in each instance and the adversary can change their values in any way. This noise is introduced on test instances and the algorithm receives no label feedback for these instances. We provide several subspace voting techniques that can be used to transform existing algorithms and prove data-dependent performance bounds in this setting. The key insight to our results is that we set our parameters so that a significant fraction of the voting hypotheses do not contain corrupt features and, for many real world problems, these uncorrupt hypotheses are sufficient to achieve high accuracy. We empirically validate our approach on several datasets including three new datasets that deal with side channel electromagnetic information.
翻译:在本文中, 我们考虑批量的监管学习, 当对手被允许以任意的大规模噪音腐蚀事件时, 被允许腐蚀每起事件中的任何一美元特征。 被允许腐蚀每起事件中的任何一美元特征, 对手可以以任何方式改变其价值。 这种噪音在测试实例中被引入, 算法不会为这些实例获得标签反馈。 我们提供几种子空间投票技术, 可用于改变现有的算法, 并证明此设置中依赖于数据的性能界限。 我们的结果的关键洞察力是, 我们设置了我们的参数, 这样一大部分的投票假设不包含腐败特征, 而对于许多真实的世界问题, 这些不折不折不扣的假设就足以实现高度的准确性。 我们从经验上验证了我们在数个数据集上的做法, 包括三个与侧通道电磁信息相关的新数据集。