We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
翻译:本文研究排队网络中联合路由与调度问题,其中边传输成本未知。在每个时隙,网络控制器仅能获取所选传输边的噪声观测成本。网络控制器的目标是通过路由与调度决策最小化总期望成本。该问题存在探索-利用权衡,但由于排队动态特性,先前基于多臂老虎机的方法无法直接应用。为确保网络稳定性,控制器需同时优化吞吐量与成本。我们证明最优成本下界可由静态优化问题解给出,并利用李雅普诺夫漂移加惩罚优化与多臂老虎机技术设计了一种网络控制策略。该策略相比完全掌握到达与成本信息的最优策略,实现了$O(\sqrt{T}\log T)$阶的次线性后悔界。最后通过仿真验证了所提策略的后悔界确为次线性。