线性混合马克夫决定程序的地方差异私人强化学习 (Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes)

Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel $(\varepsilon, \delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an $\tilde{\mathcal{O}}( d^{5/4}H^{7/4}T^{3/4}\left(\log(1/\delta)\right)^{1/4}\sqrt{1/\varepsilon})$ regret, where $d$ is the dimension of feature mapping, $H$ is the length of the planning horizon, and $T$ is the number of interactions with the environment. We also prove a lower bound $\Omega(dH\sqrt{T}/\left(e^{\varepsilon}(e^{\varepsilon}-1)\right))$ for learning linear mixture MDPs under $\varepsilon$-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

翻译：强化学习( RL) 算法可以用来提供个性化服务,这种算法依赖于用户的私人和敏感数据。为了保护用户的隐私, 需要隐私保存 RL 算法。在本文中, 我们用线性函数近似和本地差异性隐私( LDP) 保障来研究 RL 。我们提议了一个新的 $( varepsilon,\ delta) $- LDP 算法, 用于学习一组 Markov 决策程序( MDPs), 称为线性混合物 MDP, 并获得 $\ t\ mathcal{ O{ ( d\ 5/4} H\ 7/4} T\ 3/4\\\ left (\ log (\\\\\\\\\ delta\\\\\ right)\\\\\ right)\\\\ sqrqr\ $( \ varepsilall) lax 美元, lax salliversalliversalliflexalalalal liflex) 数据数据。在Mvalislislislislislislislislislalxxxxxxxxxxx 数据,, 。。在Mvalislevlislislisl