VacSIM:学习利用强化学习分发COVID-19疫苗的有效战略 (VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement Learning)

Raghav Awasthi,Keerat Kaur Guliani,Saif Ahmad Khan,Aniket Vashishtha,Mehrab Singh Gill,Arshita Bhatt,Aditya Nagori,Aniket Gupta,Ponnurangam Kumaraguru,Tavpritesh Sethi

from arxiv, 11 pages, 6 figures

A COVID-19 vaccine is our best bet for mitigating the ongoing onslaught of the pandemic. However, vaccine is also expected to be a limited resource. An optimal allocation strategy, especially in countries with access inequities and temporal separation of hot-spots, might be an effective way of halting the disease spread. We approach this problem by proposing a novel pipeline VacSIM that dovetails Sequential Decision based RL models into a Contextual Bandits approach for optimizing the distribution of COVID-19 vaccine. Whereas the Reinforcement Learning models suggest better actions and rewards, Contextual Bandits allow online modifications that may need to be implemented on a day-to-day basis in the real world scenario. We evaluate this framework against a naive allocation approach of distributing vaccine proportional to the incidence of COVID-19 cases in five different States across India and demonstrate up to 9039 additional lives potentially saved and a significant increase in the efficacy of limiting the spread over a period of 45 days through the VacSIM approach. We also propose novel evaluation strategies including standard compartmental model-based projections and a causality preserving evaluation of our model. Finally, we contribute a new Open-AI environment meant for the vaccine distribution scenario and open-source VacSIM for wide testing and applications across the globe(http://vacsim.tavlab.iiitd.edu.in:8000/).

翻译：COVID-19疫苗是我们减轻这一大流行病持续蔓延的最佳办法,但预计疫苗也是一种有限的资源。一个最佳分配战略,特别是在获得不平等和热点暂时分离的国家,可能是阻止疾病传播的有效办法。我们处理这一问题的方法是提出一个新的VacSIM管道,该管道与顺序决定相匹配,以RL为主的VacSIM模式成为最佳分配COVID-19疫苗的背景强盗模式。虽然强化学习模式建议采取更好的行动和奖励,但背景强盗允许在现实世界情景中日常实施可能需要实施的在线修改。我们评估这一框架,以天真分配方式分配疫苗,与印度五个不同国家的COVID-19病例成比例。我们通过VacSIM方法展示了多达9039人的额外可能节省的生命,并大大提高了在45天期间内限制传播CoVID-19疫苗的功效。我们还提出了新的评价战略,包括基于标准的区际模型的预测和对模型的因果关系的维护。最后,我们为在VaVAIM/VAVAVA的大规模应用提供了一种开放的公开测试。