Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.
翻译:多武装土匪(MAB)算法通过勘探-开发交易确定多种武器中的最佳臂膀,而没有事先了解武装统计。 它们在无线无线电、 IOT 和机器人上需要边缘装置的部署,因此,希望对系统芯片(SOC)进行绘图。 从理论上讲,以巴伊西亚方法为基础的汤普森抽样算法比以常客方法为基础的高信任调(UB)算法(UBB)算法(TS)的性能更好。 但是,由于Beta 功能,TS无法同步。 我们通过假随机数字发电机法来接近这一问题,并有效地实现Zynq SoC的TS算法。 在实践中,武器分配的类型(例如Bernoulli、Gaussian等)并不为人所知,因此,单一算法可能不是最佳的。 我们提议了一个可调整和智能的MAB(RI-MAB)框架。 在这里,通过情报可以辨别出给给特定环境的MAL算法,以假随机数字为基础,并有效地在Zynq Soq Soq Soq Soq Socalalalalalal-assalals 上改变了目前消费结构。