We explore the promises and challenges of employing sequential decision-making algorithms - such as bandits, reinforcement learning, and active learning - in law and public policy. While such algorithms have well-characterized performance in the private sector (e.g., online advertising), their potential in law and the public sector remains largely unexplored, due in part to distinct methodological challenges of the policy setting. Public law, for instance, can pose multiple objectives, necessitate batched and delayed feedback, and require systems to learn rational, causal decision-making policies, each of which presents novel questions at the research frontier. We highlight several applications of sequential decision-making algorithms in regulation and governance, and discuss areas for needed research to render such methods policy-compliant, more widely applicable, and effective in the public sector. We also note the potential risks of such deployments and describe how sequential decision systems can also facilitate the discovery of harms. We hope our work inspires more investigation of sequential decision making in law and public policy, which provide unique challenges for machine learning researchers with tremendous potential for social benefit.
翻译:我们探讨在法律和公共政策中采用先后决策算法 -- -- 例如土匪、强化学习和积极学习 -- -- 的许诺和挑战,虽然这些算法在私营部门(例如在线广告)有很好的特点,但其在法律和公共部门的潜力在很大程度上仍未得到探讨,部分原因是政策制定在方法上存在不同的挑战;例如,公法可以带来多重目标,需要分批和延迟反馈,需要系统学习理性、因果决策政策,每个系统在研究前沿提出新问题;我们强调在监管和治理中采用先后决策算法的若干应用,并讨论需要进行研究的领域,使这些方法符合政策,在公共部门更加广泛适用和有效;我们还注意到这种部署的潜在风险,并说明顺序决策系统如何有助于发现损害;我们希望我们的工作能够激发对法律和公共政策中顺序决策的更多调查,这为具有巨大社会效益潜力的机器学习研究人员提供了独特的挑战。