Age of Information (AoI) reflects the time that is elapsed from the generation of a packet by a 5G user equipment(UE) to the reception of the packet by a controller. A design of an AoI-aware radio resource scheduler for UEs via reinforcement learning is proposed in this paper. In this paper, we consider a remote control environment in which a number of UEs are transmitting time-sensitive measurements to a remote controller. We consider the AoI minimization problem and formulate the problem as a trade-off between minimizing the sum of the expected AoI of all UEs and maximizing the throughput of the network. Inspired by the success of machine learning in solving large networking problems at low complexity, we develop a reinforcement learning-based method to solve the formulated problem. We used the state-of-the-art proximal policy optimization algorithm to solve this problem. Our simulation results showthat the proposed algorithm outperforms the considered baselines in terms of minimizing the expected AoI while maintaining the network throughput.
One of the most challenging services fifth-generation (5G) mobile network is designed to support, is the critical services in-need of very low latency, and/or high reliability. It is now clear that such critical services will also be at the core of beyond 5G (B5G) networks. While 5G radio design accommodates such supports by introducing more flexibility in timing, how efficiently those services could be scheduled over a shared network with other broadband services remains as a challenge. In this paper, we use network slicing as an enabler for network sharing and propose an optimization framework to schedule resources to critical services via puncturing technique with minimal impact on the regular broadband services. We then thoroughly examine the performance of the framework in terms of throughput and reliability through simulation.
Simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) has been considered as a promising auxiliary device to enhance the performance of the wireless network, where users located at the different sides of the surfaces can be simultaneously served by the transmitting and reflecting signals. In this paper, the energy efficiency (EE) maximization problem for a non-orthogonal multiple access (NOMA) assisted STAR-RIS downlink network is investigated. Due to the fractional form of the EE, it is challenging to solve the EE maximization problem by the traditional convex optimization solutions. In this work, a deep deterministic policy gradient (DDPG)-based algorithm is proposed to maximize the EE by jointly optimizing the transmission beamforming vectors at the base station and the coefficients matrices at the STAR-RIS. Simulation results demonstrate that the proposed algorithm can effectively maximize the system EE considering the time-varying channels.
The optimal design of the energy-efficient multiple-input multiple-output (MIMO) aided uplink ultra-reliable low-latency communications (URLLC) system is an important but unsolved problem. For such a system, we propose a novel absorbing-Markov-chain-based analysis framework to shed light on the puzzling relationship between the delay and reliability, as well as to quantify the system energy efficiency. We derive the transition probabilities of the absorbing Markov chain considering the Rayleigh fading, the channel estimation error, the zero-forcing multi-user-detection (ZF-MUD), the grant-free access, the ACK-enabled retransmissions within the delay bound and the interactions among these technical ingredients. Then, the delay-constrained reliability and the system energy efficiency are derived based on the absorbing Markov chain formulated. Finally, we study the optimal number of user equipments (UEs) and the optimal number of receiving antennas that maximize the system energy efficiency, while satisfying the reliability and latency requirements of URLLC simultaneously. Simulation results demonstrate the accuracy of our theoretical analysis and the effectiveness of massive MIMO in supporting large-scale URLLC systems.
The optimal design of federated learning (FL) algorithms for solving general machine learning (ML) problems in practical edge computing systems with quantized message passing remains an open problem. This paper considers an edge computing system where the server and workers have possibly different computing and communication capabilities and employ quantization before transmitting messages. To explore the full potential of FL in such an edge computing system, we first present a general FL algorithm, namely GenQSGD, parameterized by the numbers of global and local iterations, mini-batch size, and step size sequence. Then, we analyze its convergence for an arbitrary step size sequence and specify the convergence results under three commonly adopted step size rules, namely the constant, exponential, and diminishing step size rules. Next, we optimize the algorithm parameters to minimize the energy cost under the time constraint and convergence error constraint, with the focus on the overall implementing process of FL. Specifically, for any given step size sequence under each considered step size rule, we optimize the numbers of global and local iterations and mini-batch size to optimally implement FL for applications with preset step size sequences. We also optimize the step size sequence along with these algorithm parameters to explore the full potential of FL. The resulting optimization problems are challenging non-convex problems with non-differentiable constraint functions. We propose iterative algorithms to obtain KKT points using general inner approximation (GIA) and tricks for solving complementary geometric programming (CGP). Finally, we numerically demonstrate the remarkable gains of GenQSGD with optimized algorithm parameters over existing FL algorithms and reveal the significance of optimally designing general FL algorithms.
The paper describes an online deep learning algorithm (ODL) for adaptive modulation and coding in massive MIMO. The algorithm is based on a fully connected neural network, which is initially trained on the output of the traditional algorithm and then incrementally retrained by the service feedback of its output. We show the advantage of our solution over the state-of-the-art Q-learning approach. We provide system-level simulation results to support this conclusion in various scenarios with different channel characteristics and different user speeds. Compared with traditional OLLA, the algorithm shows a 10\% to 20\% improvement in user throughput in the full-buffer case.
In this paper, we consider a smart factory scenario where a set of actuators receive critical control signals from an access point (AP) with reliability and low latency requirements. We investigate jointly active beamforming at the AP and passive phase shifting at the reconfigurable intelligent surface (RIS) for successfully delivering the control signals from the AP to the actuators within a required time duration. The transmission follows a two-stage design. In the first stage, each actuator can both receive the direct signal from AP and the reflected signal from the RIS. In the second stage, the actuators with successful reception in the first stage, relay the message through the D2D network to the actuators with failed receptions. We formulate a non-convex optimization problem where we first obtain an equivalent but more tractable form by addressing the problem with discrete indicator functions. Then, Frobenius inner product based equality is applied for decoupling the optimization variables. Further, we adopt a penalty-based approach to resolve the rank-one constraints. Finally, we deal with the $\ell_0$-norm by $\ell_1$-norm approximation and add an extra term $\ell_1-\ell_2$ for sparsity. Numerical results reveal that the proposed two-stage RIS-aided D2D communication protocol is effective for enabling reliable communication with latency requirements.
Accelerating learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low. This work proposes REPresentation And INstance Transfer (REPAINT) algorithm for knowledge transfer in deep reinforcement learning. REPAINT not only transfers the representation of a pre-trained teacher policy in the on-policy learning, but also uses an advantage-based experience selection approach to transfer useful samples collected following the teacher policy in the off-policy learning. Our experimental results on several benchmark tasks show that REPAINT significantly reduces the total training time in generic cases of task similarity. In particular, when the source tasks are dissimilar to, or sub-tasks of, the target tasks, REPAINT outperforms other baselines in both training-time reduction and asymptotic performance of return scores.
This paper presents a new multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We propose the use of linear and non-linear methods to develop the MODRL framework that includes both single-policy and multi-policy strategies. The experimental results on two benchmark problems including the two-objective deep sea treasure environment and the three-objective mountain car problem indicate that the proposed framework is able to converge to the optimal Pareto solutions effectively. The proposed framework is generic, which allows implementation of different deep reinforcement learning algorithms in different complex environments. This therefore overcomes many difficulties involved with standard multi-objective reinforcement learning (MORL) methods existing in the current literature. The framework creates a platform as a testbed environment to develop methods for solving various problems associated with the current MORL. Details of the framework implementation can be referred to http://www.deakin.edu.au/~thanhthi/drl.htm.
Policy gradient methods are widely used in reinforcement learning algorithms to search for better policies in the parameterized policy space. They do gradient search in the policy space and are known to converge very slowly. Nesterov developed an accelerated gradient search algorithm for convex optimization problems. This has been recently extended for non-convex and also stochastic optimization. We use Nesterov's acceleration for policy gradient search in the well-known actor-critic algorithm and show the convergence using ODE method. We tested this algorithm on a scheduling problem. Here an incoming job is scheduled into one of the four queues based on the queue lengths. We see from experimental results that algorithm using Nesterov's acceleration has significantly better performance compared to algorithm which do not use acceleration. To the best of our knowledge this is the first time Nesterov's acceleration has been used with actor-critic algorithm.
In this paper, an interference-aware path planning scheme for a network of cellular-connected unmanned aerial vehicles (UAVs) is proposed. In particular, each UAV aims at achieving a tradeoff between maximizing energy efficiency and minimizing both wireless latency and the interference level caused on the ground network along its path. The problem is cast as a dynamic game among UAVs. To solve this game, a deep reinforcement learning algorithm, based on echo state network (ESN) cells, is proposed. The introduced deep ESN architecture is trained to allow each UAV to map each observation of the network state to an action, with the goal of minimizing a sequence of time-dependent utility functions. Each UAV uses ESN to learn its optimal path, transmission power level, and cell association vector at different locations along its path. The proposed algorithm is shown to reach a subgame perfect Nash equilibrium (SPNE) upon convergence. Moreover, an upper and lower bound for the altitude of the UAVs is derived thus reducing the computational complexity of the proposed algorithm. Simulation results show that the proposed scheme achieves better wireless latency per UAV and rate per ground user (UE) while requiring a number of steps that is comparable to a heuristic baseline that considers moving via the shortest distance towards the corresponding destinations. The results also show that the optimal altitude of the UAVs varies based on the ground network density and the UE data rate requirements and plays a vital role in minimizing the interference level on the ground UEs as well as the wireless transmission delay of the UAV.