This paper studies the allocation of shared resources between vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) links in vehicle-to-everything (V2X) communications. In existing algorithms, dynamic vehicular environments and quantization of continuous power become the bottlenecks for providing an effective and timely resource allocation policy. In this paper, we develop two algorithms to deal with these difficulties. First, we propose a deep reinforcement learning (DRL)-based resource allocation algorithm to improve the performance of both V2I and V2V links. Specifically, the algorithm uses deep Q-network (DQN) to solve the sub-band assignment and deep deterministic policy-gradient (DDPG) to solve the continuous power allocation problem. Second, we propose a meta-based DRL algorithm to enhance the fast adaptability of the resource allocation policy in the dynamic environment. Numerical results demonstrate that the proposed DRL-based algorithm can significantly improve the performance compared to the DQN-based algorithm that quantizes continuous power. In addition, the proposed meta-based DRL algorithm can achieve the required fast adaptation in the new environment with limited experiences.

### 相关内容

Source: Apple - iOS 8

A reinforcement learning (RL) policy trained in a nominal environment could fail in a new/perturbed environment due to the existence of dynamic variations. Existing robust methods try to obtain a fixed policy for all envisioned dynamic variation scenarios through robust or adversarial training. These methods could lead to conservative performance due to emphasis on the worst case, and often involve tedious modifications to the training environment. We propose an approach to robustifying a pre-trained non-robust RL policy with $\mathcal{L}_1$ adaptive control. Leveraging the capability of an $\mathcal{L}_1$ control law in the fast estimation of and active compensation for dynamic variations, our approach can significantly improve the robustness of an RL policy trained in a standard (i.e., non-robust) way, either in a simulator or in the real world. Numerical experiments are provided to validate the efficacy of the proposed approach.

Time Slotted Channel Hopping (TSCH) behavioural mode has been introduced in IEEE 802.15.4e standard to address the ultra-high reliability and ultra-low power communication requirements of Industrial Internet of Things (IIoT) networks. Scheduling the packet transmissions in IIoT networks is a difficult task owing to the limited resources and dynamic topology. In this paper, we propose a phasic policy gradient (PPG) based TSCH schedule learning algorithm. The proposed PPG based scheduling algorithm overcomes the drawbacks of totally distributed and totally centralized deep reinforcement learning-based scheduling algorithms by employing the actor-critic policy gradient method that learns the scheduling algorithm in two phases, namely policy phase and auxiliary phase.

We consider distributed online min-max resource allocation with a set of parallel agents and a parameter server. Our goal is to minimize the pointwise maximum over a set of time-varying convex and decreasing cost functions, without a priori information about these functions. We propose a novel online algorithm, termed Distributed Online resource Re-Allocation (DORA), where non-stragglers learn to relinquish resource and share resource with stragglers. A notable feature of DORA is that it does not require gradient calculation or projection operation, unlike most existing online optimization strategies. This allows it to substantially reduce the computation overhead in large-scale and distributed networks. We show that the dynamic regret of the proposed algorithm is upper bounded by $O\left(T^{\frac{3}{4}}(1+P_T)^{\frac{1}{4}}\right)$, where $T$ is the total number of rounds and $P_T$ is the path-length of the instantaneous minimizers. We further consider an application to the bandwidth allocation problem in distributed online machine learning. Our numerical study demonstrates the efficacy of the proposed solution and its performance advantage over gradient- and/or projection-based resource allocation algorithms in reducing wall-clock time.

We consider a class of resource allocation problems given a set of unconditional constraints whose objective function satisfies Bellman's optimality principle. Such problems are ubiquitous in wireless communication, signal processing, and networking. These constrained combinatorial optimization problems are, in general, NP-Hard. This paper proposes two algorithms to solve this class of problems using a dynamic programming framework assisted by an information-theoretic measure. We demonstrate that the proposed algorithms ensure optimal solutions under carefully chosen conditions and use significantly reduced computational resources. We substantiate our claims by solving the power-constrained bit allocation problem in 5G massive Multiple-Input Multiple-Output receivers using the proposed approach.

The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has long been a challenging problem. The conventional search method can hardly satisfy the real-time demand during high dynamics air combat scenarios. The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks. However, the sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task, which raises great difficulties for the RL air combat application. In this paper, we propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward. The convergence and the feasibility of this method are also proved in this paper. To confirm our method feasibly, we construct a detailed 3D air combat simulation environment for the RL-based methods training firstly, and we implement our method in both the attack horizontal flight UCAV task and the self-play confrontation task. Experimental results show that our method performs better than the methods only utilizing the sparse reward or the artificial prior experience reward. The agent trained by our method can reach more than 98.3% win rate in the attack horizontal flight UCAV task and average 67.4% win rate when confronted with the agents trained by the other two methods.

The essence of distributed computing systems is how to schedule incoming requests and how to allocate all computing nodes to minimize both time and computation costs. In this paper, we propose a cost-aware optimal scheduling and allocation strategy for distributed computing systems while minimizing the cost function including response time and service cost. First, based on the proposed cost function, we derive the optimal request scheduling policy and the optimal resource allocation policy synchronously. Second, considering the effects of incoming requests on the scheduling policy, the additive increase multiplicative decrease (AIMD) mechanism is implemented to model the relation between the request arrival and scheduling. In particular, the AIMD parameters can be designed such that the derived optimal strategy is still valid. Finally, a numerical example is presented to illustrate the derived results.

This paper proposes a deep reinforcement learning-based video streaming scheme for mobility-aware vehicular networks, e.g., vehicles on the highway. We consider infrastructure-assisted and mmWave-based scenarios in which the macro base station (MBS) cannot directly provide the streaming service to vehicles due to the short range of mmWave beams so that small mmWave base stations (mBSs) along the road deliver the desired videos to users. For a smoother streaming service, the MBS proactively pushes video chunks to mBSs. This is done to support vehicles that are currently covered and/or will be by each mBS. We formulate the dynamic video delivery scheme that adaptively determines 1) which content, 2) what quality and 3) how many chunks to be proactively delivered from the MBS to mBSs using Markov decision process (MDP). Since it is difficult for the MBS to track all the channel conditions and the network states have extensive dimensions, we adopt the deep deterministic policy gradient (DDPG) algorithm for the DRL-based video delivery scheme. This paper finally shows that the DRL agent learns a streaming policy that pursues high average quality while limiting packet drops, avoiding playback stalls, reducing quality fluctuations and saving backhaul usage.

We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google's OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.

Caching and rate allocation are two promising approaches to support video streaming over wireless network. However, existing rate allocation designs do not fully exploit the advantages of the two approaches. This paper investigates the problem of cache-enabled QoE-driven video rate allocation problem. We establish a mathematical model for this problem, and point out that it is difficult to solve the problem with traditional dynamic programming. Then we propose a deep reinforcement learning approaches to solve it. First, we model the problem as a Markov decision problem. Then we present a deep Q-learning algorithm with a special knowledge transfer process to find out effective allocation policy. Finally, numerical results are given to demonstrate that the proposed solution can effectively maintain high-quality user experience of mobile user moving among small cells. We also investigate the impact of configuration of critical parameters on the performance of our algorithm.

Yikun Cheng,Pan Zhao,Manan Gandhi,Bo Li,Evangelos Theodorou,Naira Hovakimyan
0+阅读 · 2021年12月9日
Lokesh Bommisetty,TG Venkatesh
0+阅读 · 2021年12月8日
Jingrong Wang,Ben Liang
0+阅读 · 2021年12月7日
0+阅读 · 2021年12月7日
Yiwen Zhu,Zhou Fang,Yuan Zheng,Wenya Wei
0+阅读 · 2021年12月1日
Wei Ren,Eleftherios Vlahakis,Nikolaos Athanasopoulos,Raphael Jungers
0+阅读 · 2021年10月15日
Won Joon Yun,Dohyun Kwon,Minseok Choi,Joongheon Kim,Guiseppe Caire,Andreas F. Molisch
0+阅读 · 2021年10月12日
Jin Zhang,Jianhao Wang,Hao Hu,Tong Chen,Yingfeng Chen,Changjie Fan,Chongjie Zhang
9+阅读 · 2021年2月7日
Mohammadreza Nazari,Afshin Oroojlooy,Lawrence V. Snyder,Martin Takáč
3+阅读 · 2018年5月21日
Zhengming Zhang,Yaru Zheng,Meng Hua,Yongming Huang,Luxi Yang
4+阅读 · 2018年3月30日

113+阅读 · 2020年2月1日

30+阅读 · 2019年10月17日

75+阅读 · 2019年10月11日

CreateAMind
13+阅读 · 2019年5月22日
CreateAMind
7+阅读 · 2019年1月7日
CreateAMind
32+阅读 · 2019年1月3日
CreateAMind
4+阅读 · 2018年12月28日
CreateAMind
4+阅读 · 2018年12月17日
CreateAMind
16+阅读 · 2018年5月25日
CreateAMind
9+阅读 · 2018年4月27日
CreateAMind
11+阅读 · 2017年8月2日
Top