通过渐渐后代学习斯托卡式最佳政策 (Learning Stochastic Optimal Policies via Gradient Descent)

We systematically develop a learning-based treatment of stochastic optimal control (SOC), relying on direct optimization of parametric control policies. We propose a derivation of adjoint sensitivity results for stochastic differential equations through direct application of variational calculus. Then, given an objective function for a predetermined task specifying the desiderata for the controller, we optimize their parameters via iterative gradient descent methods. In doing so, we extend the range of applicability of classical SOC techniques, often requiring strict assumptions on the functional form of system and control. We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.

翻译：我们通过直接优化参数控制政策,系统地发展基于学习的对随机最佳控制(SOC)的处理方法;我们提议通过直接应用变微分法,对随机差异方程式产生联合敏感性结果;然后,给一项预先确定的任务设定一个客观功能,具体指定控制器的分层,我们通过迭代梯度下降法优化其参数;这样,我们扩大了传统SOC技术的适用范围,常常要求对系统和控制的功能形式进行严格的假设;我们用比例交易成本来核查拟议方法的绩效,即连续时间、有限地平线组合优化和比例交易成本。