Modern radio telescopes produce unprecedented amounts of data, which are passed through many processing pipelines before the delivery of scientific results. Hyperparameters of these pipelines need to be tuned by hand to produce optimal results. Because many thousands of observations are taken during a lifetime of a telescope and because each observation will have its unique settings, the fine tuning of pipelines is a tedious task. In order to automate this process of hyperparameter selection in data calibration pipelines, we introduce the use of reinforcement learning. We test two reinforcement learning techniques, twin delayed deep deterministic policy gradient (TD3) and soft actor-critic (SAC), to train an autonomous agent to perform this fine tuning. For the sake of generalization, we consider the pipeline to be a black-box system where the summarized state of the performance of the pipeline is used by the autonomous agent. The autonomous agent trained in this manner is able to determine optimal settings for diverse observations and is therefore able to perform 'smart' calibration, minimizing the need for human intervention.
翻译:现代射电望远镜产生数量空前的数据,这些数据在科学结果产生之前通过许多加工管道传递,这些管道的超参数需要手工调整,才能产生最佳结果。由于在望远镜的寿命期内进行了数千次观测,而且由于每次观测都有独特的环境,因此对管道进行微调是一项繁琐的任务。为了使这一在数据校准管道中选择超光度计的过程自动化,我们引入了强化学习的使用。我们测试了两种强化学习技术,即双对深层确定性能政策梯度(TD3)和软性行为者-critic(SAC),以训练一个自主的代理进行这种微调。为了普遍化,我们认为管道是一种黑盒系统,自主代理使用管道性能的概要状态。接受过这种培训的自主代理能够确定各种观测的最佳环境,因此能够进行“智能”校准,从而最大限度地减少人类干预的需要。