The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this problem. However, the astronomically large state space of biological sequences renders brute-force iterating over all possible sequences infeasible. In this paper, we propose MetaRLBO where we train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection via Bayesian Optimization. We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data acquired in the previous rounds. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines.
翻译:加速生物序列设计的能力可以对医学领域的进展产生重大影响。 问题可以被描述为一个全球优化问题, 其目标是一个昂贵的黑盒功能, 以便我们能够查询受限制的大型批量, 限制数量少。 巴伊西亚优化是解决这一问题的一项原则方法。 然而, 生物序列的天文空间之大, 使得生物序列的粗糙性循环无法对所有可能的序列进行。 在本文中, 我们提议MetaRLBO, 通过Meta- strucation Learning 培训自动递增基因模型, 以提出通过Bayesian Opitimization进行选择的有希望的序列。 我们提出这个问题是为了找到最佳政策, 通过抽样分析前几轮获得的数据分组来分配 MDP。 我们的硅实验显示, 通过这些酶的元学习可以防止奖励性误差, 并取得与现有强势基线相比具有竞争力的结果 。