系统对芯片的多武装土匪定义: 常见者还是贝耶斯人? (Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?) - 专知论文

会员服务 ·

0

频率主义学派 · 赌博机/老虎机 · 上置信界限 · ARM · 可辨认的 ·

2021 年 6 月 5 日

Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?

翻译：系统对芯片的多武装土匪定义: 常见者还是贝耶斯人?

S. V. Sai Santosh,Sumit J. Darak

Multi-armed Bandit (MAB) algorithms identify the best arm among multiple arms via exploration-exploitation trade-off without prior knowledge of arm statistics. Their usefulness in wireless radio, IoT, and robotics demand deployment on edge devices, and hence, a mapping on system-on-chip (SoC) is desired. Theoretically, the Bayesian approach-based Thompson Sampling (TS) algorithm offers better performance than the frequentist approach-based Upper Confidence Bound (UCB) algorithm. However, TS is not synthesizable due to Beta function. We address this problem by approximating it via a pseudo-random number generator-based approach and efficiently realize the TS algorithm on Zynq SoC. In practice, the type of arms distribution (e.g., Bernoulli, Gaussian, etc.) is unknown and hence, a single algorithm may not be optimal. We propose a reconfigurable and intelligent MAB (RI-MAB) framework. Here, intelligence enables the identification of appropriate MAB algorithms for a given environment, and reconfigurability allows on-the-fly switching between algorithms on the SoC. This eliminates the need for parallel implementation of algorithms resulting in huge savings in resources and power consumption. We analyze the functional correctness, area, power, and execution time of the proposed and existing architectures for various arm distributions, word-length, and hardware-software co-design approaches. We demonstrate the superiority of the RI-MAB over TS and UCB only architectures.

翻译：多武装土匪(MAB)算法通过勘探-开发交易确定多种武器中的最佳臂膀,而没有事先了解武装统计。它们在无线无线电、 IOT 和机器人上需要边缘装置的部署,因此,希望对系统芯片(SOC)进行绘图。从理论上讲,以巴伊西亚方法为基础的汤普森抽样算法比以常客方法为基础的高信任调(UB)算法(UBB)算法(TS)的性能更好。但是,由于Beta 功能,TS无法同步。我们通过假随机数字发电机法来接近这一问题,并有效地实现Zynq SoC的TS算法。在实践中,武器分配的类型(例如Bernoulli、Gaussian等)并不为人所知,因此,单一算法可能不是最佳的。我们提议了一个可调整和智能的MAB(RI-MAB)框架。在这里,通过情报可以辨别出给给特定环境的MAL算法,以假随机数字为基础,并有效地在Zynq Soq Soq Soq Soq Socalalalalalal-assalals 上改变了目前消费结构。

0

相关内容

频率主义学派

频率主义学派

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

131+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

14+阅读 · 2020年1月13日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

19+阅读 · 2019年12月2日

【ISWC2019教程】可扩展可持续知识图谱构建，251页ppt，Scalable construction of sustainable knowledge bases

【ISWC2019教程】可扩展可持续知识图谱构建，251页ppt，Scalable construction of sustainable knowledge bases

专知会员服务

46+阅读 · 2019年12月1日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

65+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

7+阅读 · 2019年5月27日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

人工智能 | ICAPS 2019等国际会议信息3条

人工智能 | ICAPS 2019等国际会议信息3条

Call4Papers

3+阅读 · 2018年9月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

A Hierarchical and Modular Radio Resource Management Architecture for 5G and Beyond

Arxiv

0+阅读 · 2021年7月30日

Evolutionary Algorithms for the Chance-Constrained Knapsack Problem

Arxiv

0+阅读 · 2021年7月30日

Coordinating users of shared facilities via data-driven predictive assistants and game theory

Arxiv

0+阅读 · 2021年7月29日

P4COM: In-Network Computation with Programmable Switches

Arxiv

0+阅读 · 2021年7月29日

Dynamic Lambda-Field: A Counterpart of the Bayesian Occupancy Grid for Risk Assessment in Dynamic Environments

Arxiv

0+阅读 · 2021年7月28日

Multiperiod Dispatching and Routing for On-Time Delivery in a Dynamic and Stochastic Environment

Arxiv

0+阅读 · 2021年7月27日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

VIP会员

文章信息

相关主题

频率主义学派

赌博机/老虎机

上置信界限

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

131+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

106+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

14+阅读 · 2020年1月13日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

19+阅读 · 2019年12月2日

【ISWC2019教程】可扩展可持续知识图谱构建，251页ppt，Scalable construction of sustainable knowledge bases

【ISWC2019教程】可扩展可持续知识图谱构建，251页ppt，Scalable construction of sustainable knowledge bases

专知会员服务

46+阅读 · 2019年12月1日

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

【DLBM-SS暑期课程】深度学习与贝叶斯方法 Deep Learning and Bayesian Methods

专知会员服务

65+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

热门VIP内容

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

7+阅读 · 2019年5月27日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

人工智能 | ICAPS 2019等国际会议信息3条

人工智能 | ICAPS 2019等国际会议信息3条

Call4Papers

3+阅读 · 2018年9月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

A Hierarchical and Modular Radio Resource Management Architecture for 5G and Beyond

Arxiv

0+阅读 · 2021年7月30日

Evolutionary Algorithms for the Chance-Constrained Knapsack Problem

Arxiv

0+阅读 · 2021年7月30日

Coordinating users of shared facilities via data-driven predictive assistants and game theory

Arxiv

0+阅读 · 2021年7月29日

P4COM: In-Network Computation with Programmable Switches

Arxiv

0+阅读 · 2021年7月29日

Dynamic Lambda-Field: A Counterpart of the Bayesian Occupancy Grid for Risk Assessment in Dynamic Environments

Arxiv

0+阅读 · 2021年7月28日

Multiperiod Dispatching and Routing for On-Time Delivery in a Dynamic and Stochastic Environment

Arxiv

0+阅读 · 2021年7月27日

Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Arxiv

8+阅读 · 2020年11月26日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

微信扫码咨询专知VIP会员