Offline reinforcement learning will be used to study two important problems for B5G:
1) deadline-aware wireless scheduling to guarantee low latency communications and
2) wireless edge caching to achieve high-bandwidth content delivery.
Task 1 will explore deadline-aware wireless scheduling, a long-standing open problem. We propose to use physics-aided offline RL to devise scheduling algorithms, where the A-C method is employed for offline training of scheduling policies as follows:
With a good (existing) model-based scheduling algorithm as the initial actor structure, the A-C method will be leveraged to improve the scheduling policy gradually, thanks to the nature of policy improvement. Further, innovative algorithms will be devised to address the outstanding problems in the A-C method, namely overestimation bias and high variance; and Meta-RL will be used for adaptation to distribution shift in nonstationary network dynamics. Next, Task 2 will be devoted to wireless edge caching, where the storage capacities at the network edge (including both base stations and user equipment) are harnessed to alleviate the need of high-bandwidth communications in real-time. We will exploit offline RL methods to solve the joint communication and caching optimization in wireless edge caching. The preliminary collaborative study, inherently respecting system constraints, shows promising initial results. We will continue along this line to generalize the framework to study coded caching. We will evaluate proposed algorithms through 1) validation of basic algorithm functionalities and 2) extensive RL experiments.