How can we effectively remove or ''unlearn'' undesirable information, such as specific features or the influence of individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a unified mathematical framework based on information-theoretic regularization to address both data point unlearning and feature unlearning. For data point unlearning, we introduce the $\textit{Marginal Unlearning Principle}$, an auditable and provable framework inspired by memory suppression studies in neuroscience. Moreover, we provide formal information-theoretic unlearning definition based on the proposed principle, named marginal unlearning, and provable guarantees on sufficiency and necessity of marginal unlearning to the existing approximate unlearning definitions. We then show the proposed framework provide natural solution to the marginal unlearning problems. For feature unlearning, the framework applies to deep learning with arbitrary training objectives. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications. From a mathematical perspective, we provide an unified analytic solution to the optimal feature unlearning problem with a variety of information-theoretic training objectives. Our theoretical analysis reveals intriguing connections between machine unlearning, information theory, optimal transport, and extremal sigma algebras. Numerical simulations support our theoretical finding.
翻译:如何在最小化效用损失并确保严格保证的前提下,有效移除或“遗忘”学习结果中的不良信息(如特定特征或单个数据点的影响)?我们提出了一种基于信息论正则化的统一数学框架,以同时处理数据点遗忘和特征遗忘问题。针对数据点遗忘,我们提出了受神经科学记忆抑制研究启发的《边际遗忘原理》——一个可审计且可证明的框架。此外,我们基于该原理提出了形式化的信息论遗忘定义(称为边际遗忘),并证明了边际遗忘对现有近似遗忘定义的充分性与必要性。我们随后证明所提框架为边际遗忘问题提供了自然解决方案。对于特征遗忘,该框架适用于具有任意训练目标的深度学习场景。通过将学习目标的灵活性与正则化设计的简洁性相结合,我们的方法对广泛的机器学习与人工智能应用具有高度适应性和实用性。从数学视角,我们为具有多种信息论训练目标的最优特征遗忘问题提供了统一解析解。理论分析揭示了机器遗忘、信息论、最优传输与极值σ代数之间深刻的内在联系。数值模拟结果支持了我们的理论发现。