The quality of data driven learning algorithms scales significantly with the quality of data available. One of the most straight-forward ways to generate good data is to sample or explore the data source intelligently. Smart sampling can reduce the cost of gaining samples, reduce computation cost in learning, and enable the learning algorithm to adapt to unforeseen events. In this paper, we teach three Deep Q-Networks (DQN) with different exploration strategies to solve a problem of puncturing ongoing transmissions for URLLC messages. We demonstrate the efficiency of two adaptive exploration candidates, variance-based and Maximum Entropy-based exploration, compared to the standard, simple epsilon-greedy exploration approach.
翻译:数据驱动的学习算法的质量显著取决于可用数据的质量。智能采样或探索数据源是生成良好数据的最简单方法之一。智能采样可以降低获得样本的成本,减少学习过程中的计算成本,并使学习算法适应预料之外的事件。本文将三种不同探索策略的Deep Q-Networks (DQN)引入一个针对URLLC消息的信道打孔问题的学习过程中。我们说明了两种自适应探索候选方法,基于方差和最大熵,相比于标准的简单贪心探索方法的效率。