有限可观测性下不安定老虎机问题的一般性建模与PCL分析 (General Formulation and PCL-Analysis for Restless Bandits with Limited Observability) - 专知论文

会员服务 ·

0

分析 · 可观测性 · 无限 · 算法 · 观测模型 ·

General Formulation and PCL-Analysis for Restless Bandits with Limited Observability

翻译：有限可观测性下不安定老虎机问题的一般性建模与PCL分析

Keqin Liu,Qizhen Jia

In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player is based on the past observation history that is limited (partial) and error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of the observation process, we formulate the problem as a restless bandit with an infinite high-dimensional belief state space. We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Niño-Mora (2001) for finite-state problems can be applied. Numerical experiments show that our algorithm has excellent performance.

翻译：本文针对不安定多臂老虎机问题提出了一种通用的观测模型。由于资源限制或环境及内在噪声的影响，决策者的操作基于有限（部分）且易出错的过往观测历史。通过建立观测过程动态的一般概率模型，我们将该问题表述为具有无限高维信念状态空间的不安定老虎机问题。我们将可实现区域方法与部分守恒定律（PCL）应用于这一无限状态问题，并分析其可索引性及优先级指数（Whittle指数）。最后，我们提出一种近似转换方法，将问题转化为可应用Niño-Mora（2001）针对有限状态问题提出的AG算法的形式。数值实验表明，我们所提算法具有优异的性能。

0

相关内容

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

Seabed-to-Sky Mapping of Maritime Environments with a Dual Orthogonal SONAR and LiDAR Sensor Suite

Arxiv

0+阅读 · 12月4日

Anti-Concentration for the Unitary Haar Measure and Applications to Random Quantum Circuits

Arxiv

0+阅读 · 11月27日

Sampling-Based Optimization with Parallelized Physics Simulator for Bimanual Manipulation

Arxiv

0+阅读 · 11月26日

Adaptive Algorithms for Infinitely Many-Armed Bandits: A Unified Framework

Arxiv

0+阅读 · 11月3日

Contrastive Transformation for Self-supervised Correspondence Learning

Contrastive Transformation for Self-supervised Correspondence Learning

Arxiv

13+阅读 · 2020年12月9日

VIP会员

文章信息

相关主题

相关VIP内容

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

Seabed-to-Sky Mapping of Maritime Environments with a Dual Orthogonal SONAR and LiDAR Sensor Suite

Arxiv

0+阅读 · 12月4日

Anti-Concentration for the Unitary Haar Measure and Applications to Random Quantum Circuits

Arxiv

0+阅读 · 11月27日

Sampling-Based Optimization with Parallelized Physics Simulator for Bimanual Manipulation

Arxiv

0+阅读 · 11月26日

Adaptive Algorithms for Infinitely Many-Armed Bandits: A Unified Framework

Arxiv

0+阅读 · 11月3日

Contrastive Transformation for Self-supervised Correspondence Learning

Contrastive Transformation for Self-supervised Correspondence Learning

Arxiv

13+阅读 · 2020年12月9日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员