机器学习数据市场转售 (Replication-Robust Payoff-Allocation for Machine Learning Data Markets)

The increasing take-up of machine learning techniques requires ever-more application-specific training data. Manually collecting such training data is time-consuming and error-prone process. Data marketplaces represent a compelling alternative, providing an easy way for acquiring data from potential data providers. A key component of such marketplaces is the compensation mechanism for data providers. Classic payoff-allocation methods, such as the Shapley value, can be vulnerable to data-replication attacks, and are infeasible to compute in the absence of efficient approximation algorithms. To address these challenges, we present an extensive theoretical study on the vulnerabilities of game theoretic payoff-allocation schemes to replication attacks. Our insights apply to a wide range of payoff-allocation schemes, and enable the design of customised replication-robust payoff-allocations. Furthermore, we present a novel efficient sampling algorithm for approximating payoff-allocation schemes based on marginal contributions. In our experiments, we validate the replication-robustness of classic payoff-allocation schemes and new payoff-allocation schemes derived from our theoretical insights. We also demonstrate the efficiency of our proposed sampling algorithm on a wide range of machine learning tasks.

翻译：机械学习技术的日益采用需要更多具体应用的培训数据。手工收集这种培训数据是一个耗时和容易出错的过程。数据市场是一个令人信服的替代办法,为从潜在的数据提供者获取数据提供了方便的途径。这种市场的一个关键组成部分是数据提供者的补偿机制。典型的付款分配方法,如Shapley值,可能易受数据复制攻击,并且无法在缺乏高效近似算法的情况下进行计算。为了应对这些挑战,我们提出了关于游戏理论性报酬分配办法的脆弱性的广泛理论研究,以便复制攻击。我们的见解适用于广泛的支付性分配办法,并能够设计定制化的复制-机器人报酬分配办法。此外,我们提出了基于边际贡献的接近性报酬分配办法的新的有效抽样算法。我们在实验中,验证了典型的支付性分配办法和根据我们理论见解得出的新的支付性分配办法的复制-破坏性。我们还展示了我们提议的抽样算法在理论性学习任务方面的效率。

相关内容

Machine Learning

关注 2220

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

专知会员服务

38+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

113+阅读 · 2020年4月5日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

112+阅读 · 2019年12月24日

吴恩达新书《Machine Learning Yearning》完整中文版

专知会员服务

144+阅读 · 2019年10月27日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf