RL-real-time-scheduling – Zhang, Junshan

NSF project 2203239: CNS Core: Small: Reinforcement Learning for Real-time Wireless Scheduling and Edge Caching: Theory and Algorithm Design.

Overview:

5G and Beyond (B5G) technology promises to offer enhanced mobile broadband and ultra-reliable low-latency communications services. Indeed, recent years have witnessed a tremendous growth in real-time applications in wirelessly networked systems, such as connected cars and multi-user augmented reality (AR). For instance, for connected cars, coordinated sensing and mobility control rely heavily on real-time information exchange among vehicles. Wireless edge caching is another emerging B5G application requiring high bandwidth, where optimal caching decisions would depend on the cache contents and dynamic user demand profiles. A census is that conventional approaches, using model-based optimization to tackle the challenges in guaranteeing ultra-low latency and high bandwidth services, may not work well in some complicated B5G settings, calling for machine learning based solutions.

Technical approaches and main tasks:

In this project, recent advances on offline reinforcement learning (RL) will be leveraged to study two important problems in B5G, namely 1) deadline-aware wireless scheduling to guarantee low latency and 2) edge caching to achieve high bandwidth content delivery, as outlined below:

Task 1: Deadline-aware scheduling of real-time traffic has been a long-standing open problem, despite significant effort using model-based optimization. In Task 1, deadline-aware scheduling policies will be trained using physics-aided offline RL, ready to be used for online scheduling. Specifically, the Actor-Critic (A-C) method will be used for offline training of scheduling policies, consisting of two phases: 1) initialization of Actor structure via behavioral cloning and 2) policy improvement via the physics-aided A-C method. The underlying rationale is as follows: With a good model-based scheduling algorithm as the initial actor structure, the A-C method can be leveraged to yield a better scheduling policy, thanks to its nature of policy improvement. Further, innovative algorithms will be devised to address the outstanding problems in the A-C method, namely overestimation bias and high variance, and Meta-RL will be used for adaptation to distribution shift in nonstationary network dynamics. The PIs have taken initial steps and obtained promising results, and will pursue in-depth investigation.

Task 2: The focus of Task 2 is on wireless edge caching, an application where the storage capacities at both the network edge and user devices are harnessed to alleviate the need of high-bandwidth communications over long distances. The combinatorial nature of joint communication and caching optimization herein, with the uncertainties of system dynamics, calls for non-trivial design of machine learning algorithms. We will leverage deep RL to investigate wireless edge caching. The preliminary study by the PIs, inherently respecting system constraints, has yield interesting initial results. The PIs will continue along the lines of the preliminary work to generalize the framework to study coded caching.