Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the mapping of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions.
The digital divide restricting the access of people living in developing areas to the benefits of modern information and communications technologies has become a major challenge and research focus. To well understand and finally bridge the digital divide, we first need to discover a proper measure to characterize and quantify the telecommunication service imbalance. In this regard, we propose a fine-grained and easy-to-compute imbalance index, aiming to quantitatively link the relation among telecommunication service imbalance, telecommunication infrastructure, and demographic distribution. The mathematically elegant and generic form of the imbalance index allows consistent analyses for heterogeneous scenarios and can be easily tailored to incorporate different telecommunication policies and application scenarios. Based on this index, we also propose an infrastructure resource deployment strategy by minimizing the average imbalance index of any geographical segment. Experimental results verify the effectiveness of the proposed imbalance index by showing a high degree of correlation to existing congeneric but coarse-grained measures and the superiority of the infrastructure resource deployment strategy.
Scheduling is a fundamental task occurring in various automated systems applications, e.g., optimal schedules for machines on a job shop allow for a reduction of production costs and waste. Nevertheless, finding such schedules is often intractable and cannot be achieved by Combinatorial Optimization Problem (COP) methods within a given time limit. Recent advances of Deep Reinforcement Learning (DRL) in learning complex behavior enable new COP application possibilities. This paper presents an efficient DRL environment for Job-Shop Scheduling -- an important problem in the field. Furthermore, we design a meaningful and compact state representation as well as a novel, simple dense reward function, closely related to the sparse make-span minimization criteria used by COP methods. We demonstrate that our approach significantly outperforms existing DRL methods on classic benchmark instances, coming close to state-of-the-art COP approaches.
Device-to-device (D2D) and non-orthogonal multiple access (NOMA) are promising technologies to meet the challenges of the next generations of mobile communications in terms of network density and diversity for internet of things (IoT) services. This paper tackles the problem of maximizing the D2D sum-throughput in an IoT system underlaying a cellular network, through optimal channel and power allocation. NOMA is used to manage the interference between cellular users and full-duplex (FD) IoT devices. To this aim, mutual successive interference cancellation (SIC) conditions are identified to allow simultaneously the removal of the D2D devices interference at the level of the base station and the removal of the cellular users (CU) interference at the level of D2D devices. To optimally solve the joint channel and power allocation (PA) problem, a time-efficient solution of the PA problem in the FD context is elaborated. By means of graphical representation, the complex non-convex PA problem is efficiently solved in constant time complexity. This enables the global optimal resolution by successively solving the separate PA and channel assignment problems. The performance of the proposed strategy is compared against the classical state-of-the-art FD and HD scenarios, where SIC is not applied between CUs and IoT devices. The results show that important gains can be achieved by applying mutual SIC NOMA in the IoT-cellular context, in either HD or FD scenarios.
The distributed Volt/Var control (VVC) methods have been widely studied for active distribution networks(ADNs), which is based on perfect model and real-time P2P communication. However, the model is always incomplete with significant parameter errors and such P2P communication system is hard to maintain. In this paper, we propose an online multi-agent reinforcement learning and decentralized control framework (OLDC) for VVC. In this framework, the VVC problem is formulated as a constrained Markov game and we propose a novel multi-agent constrained soft actor-critic (MACSAC) reinforcement learning algorithm. MACSAC is used to train the control agents online, so the accurate ADN model is no longer needed. Then, the trained agents can realize decentralized optimal control using local measurements without real-time P2P communication. The OLDC with MACSAC has shown extraordinary flexibility, efficiency and robustness to various computing and communication conditions. Numerical simulations on IEEE test cases not only demonstrate that the proposed MACSAC outperforms the state-of-art learning algorithms, but also support the superiority of our OLDC framework in the online application.
Network slicing is the key to enable virtualized resource sharing among vertical industries in the era of 5G communication. Efficient resource allocation is of vital importance to realize network slicing in real-world business scenarios. To deal with the high algorithm complexity, privacy leakage, and unrealistic offline setting of current network slicing algorithms, in this paper we propose a fully decentralized and low-complexity online algorithm, DPoS, for multi-resource slicing. We first formulate the problem as a global social welfare maximization problem. Next, we design the online algorithm DPoS based on the primal-dual approach and posted price mechanism. In DPoS, each tenant is incentivized to make its own decision based on its true preferences without disclosing any private information to the mobile virtual network operator and other tenants. We provide a rigorous theoretical analysis to show that DPoS has the optimal competitive ratio when the cost function of each resource is linear. Extensive simulation experiments are conducted to evaluate the performance of DPoS. The results show that DPoS can not only achieve close-to-offline-optimal performance, but also have low algorithmic overheads.
In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.
Significant efforts are being invested to bring state-of-the-art classification and recognition to edge devices with extreme resource constraints (memory, speed and lack of GPU support). Here, we demonstrate the first deep network for acoustic recognition that is small enough for an off-the-shelf microcrocontroller, yet achieves state-of-the-art performance on standard benchmarks. Rather than handcrafting a once-off solution, we present a universal pipeline that converts a large deep convolutional network automatically via compression and quantization into a network for resource-impoverished edge devices. After introducing ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%) and ESC-50 (87.1%), we describe the compression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28% FLOP reduction while maintaining close to state-of-the-art accuracy (83.65% on ESC-50). We describe a successful implementation on a standard off-the-shelf microcontroller and, beyond laboratory benchmarks, report successful tests on real-world data sets.
The next-generation of wireless networks will enable many machine learning (ML) tools and applications to efficiently analyze various types of data collected by edge devices for inference, autonomy, and decision making purposes. However, due to resource constraints, delay limitations, and privacy challenges, edge devices cannot offload their entire collected datasets to a cloud server for centrally training their ML models or inference purposes. To overcome these challenges, distributed learning and inference techniques have been proposed as a means to enable edge devices to collaboratively train ML models without raw data exchanges, thus reducing the communication overhead and latency as well as improving data privacy. However, deploying distributed learning over wireless networks faces several challenges including the uncertain wireless environment, limited wireless resources (e.g., transmit power and radio spectrum), and hardware resources. This paper provides a comprehensive study of how distributed learning can be efficiently and effectively deployed over wireless edge networks. We present a detailed overview of several emerging distributed learning paradigms, including federated learning, federated distillation, distributed inference, and multi-agent reinforcement learning. For each learning framework, we first introduce the motivation for deploying it over wireless networks. Then, we present a detailed literature review on the use of communication techniques for its efficient deployment. We then introduce an illustrative example to show how to optimize wireless networks to improve its performance. Finally, we introduce future research opportunities. In a nutshell, this paper provides a holistic set of guidelines on how to deploy a broad range of distributed learning frameworks over real-world wireless communication networks.
This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been the key solution to sequential decision-making problems. Along with the fast advance of RL in various domains. including robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which these transfer learning techniques would be approachable. We discuss the relationship between transfer learning and other relevant topics from an RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL.
Network Virtualization is one of the most promising technologies for future networking and considered as a critical IT resource that connects distributed, virtualized Cloud Computing services and different components such as storage, servers and application. Network Virtualization allows multiple virtual networks to coexist on same shared physical infrastructure simultaneously. One of the crucial keys in Network Virtualization is Virtual Network Embedding, which provides a method to allocate physical substrate resources to virtual network requests. In this paper, we investigate Virtual Network Embedding strategies and related issues for resource allocation of an Internet Provider(InP) to efficiently embed virtual networks that are requested by Virtual Network Operators(VNOs) who share the same infrastructure provided by the InP. In order to achieve that goal, we design a heuristic Virtual Network Embedding algorithm that simultaneously embeds virtual nodes and virtual links of each virtual network request onto physic infrastructure. Through extensive simulations, we demonstrate that our proposed scheme improves significantly the performance of Virtual Network Embedding by enhancing the long-term average revenue as well as acceptance ratio and resource utilization of virtual network requests compared to prior algorithms.