Zhang, Junshan

NSF project 2203239: CNS Core: Small: Reinforcement Learning for Real-time Wireless Scheduling and Edge Caching: Theory and Algorithm Design.

Overview:

5G and Beyond (B5G) technology promises to offer enhanced mobile broadband  and ultra-reliable low-latency communications  services. Indeed, recent years have witnessed a tremendous growth in real-time  applications  in wirelessly networked systems, such as connected cars and multi-user augmented reality (AR).   For instance, for connected cars,  coordinated sensing and mobility control rely heavily on  real-time information exchange among vehicles. Wireless edge caching is   another emerging B5G application requiring high bandwidth, where optimal caching decisions would depend on the cache contents and dynamic user demand profiles. A census is that conventional approaches, using model-based optimization to tackle the challenges in guaranteeing ultra-low latency and high bandwidth services,  may not work well  in some complicated B5G settings, calling for machine learning based solutions.

Technical approaches and main tasks:

  In this project, recent advances on offline reinforcement learning  (RL) will be leveraged to study  two important problems in B5G, namely  1) deadline-aware wireless scheduling to guarantee low latency and 2) edge caching to achieve  high bandwidth content delivery, as outlined below:

Task 1: Deadline-aware scheduling of real-time traffic has been a long-standing open problem, despite significant effort using model-based optimization. In Task 1,  deadline-aware scheduling policies will be trained using physics-aided offline RL, ready to be used for online scheduling. Specifically,   the Actor-Critic (A-C) method will be used for offline training of scheduling policies, consisting of two phases: 1) initialization of Actor structure via behavioral cloning and 2)  policy improvement via the physics-aided A-C method. The underlying  rationale is as follows:   With a good model-based scheduling algorithm as the initial actor structure,   the A-C method can be leveraged to yield a better scheduling policy, thanks to its nature of policy improvement. Further,  innovative algorithms will be devised to address the outstanding problems in the A-C method, namely overestimation bias and high variance, and Meta-RL will be used for adaptation to distribution shift in nonstationary network dynamics.   The   PIs have taken initial steps and obtained promising   results, and will pursue  in-depth investigation.

Task 2: The focus of Task 2 is on wireless edge caching,  an application where the storage capacities  at both the network edge and user devices are harnessed to alleviate the need of high-bandwidth communications over long distances. The combinatorial nature of joint communication and caching optimization herein, with the uncertainties of  system dynamics, calls for non-trivial design of machine learning algorithms.  We will leverage deep RL to investigate  wireless edge caching. The preliminary  study by the PIs, inherently respecting system constraints, has yield interesting initial results.  The PIs will continue along the lines of the preliminary work to generalize the framework to study coded caching.