系统仿真学报 ›› 2023, Vol. 35 ›› Issue (2): 372-385.doi: 10.16182/j.issn1004731x.joss.21-1118

• 论文 • 上一篇    下一篇

基于Stackelberg博弈与深度强化学习的计算卸载策略

周娴玮(), 龚启旭, 余松森()   

  1. 华南师范大学 软件学院,广东 佛山 528225
  • 收稿日期:2021-11-02 修回日期:2022-01-07 出版日期:2023-02-28 发布日期:2023-02-16
  • 通讯作者: 余松森 E-mail:20871147@qq.com;yss8109@163.com
  • 作者简介:周娴玮(1982-),男,讲师,博士,研究方向为机器人技术、多传感信息融合。E-mail:20871147@qq.com
  • 基金资助:
    广东省应用型科技研发重大专项(2016B020244003);广东省基础与应用基础研究基金(2020B1515120089);广东省企业科技特派员项目(GDKTP2020014000)

Computation Offloading Strategy Based on Stackelberg Game and DRL

Xianwei Zhou(), Qixu Gong, Songsen Yu()   

  1. School of Software, South China Normal University, Foshan 528225, China
  • Received:2021-11-02 Revised:2022-01-07 Online:2023-02-28 Published:2023-02-16
  • Contact: Songsen Yu E-mail:20871147@qq.com;yss8109@163.com

摘要:

为使5G混合专网结构的2种用户能获得最优计算卸载策略,将2种用户竞争移动边缘计算(mobile edge computing,MEC)服务器资源的问题建模成Stackelberg博弈,并分别讨论了完全信息博弈和不完全信息博弈下的策略。完全信息博弈下,存在唯一纳什均衡解;不完全信息博弈下,将环境建模为部分可观测的马尔可夫决策过程(partially observable Markov decision process,POMDP),并提出一种基于二阶段深度强化学习(two-stage deep reinforcement learning,TSDRL)的最优卸载策略。仿真实验表明:该算法相较于D-DRL算法能减少20.81%的时延及3.38%的能耗,有效提高用户QoE(quality of experience)。

关键词: 5G混合专网, 计算卸载, Stackelberg博弈, Nash均衡, 马尔可夫决策过程

Abstract:

To achieve the optimal computation offloading strategy for two kinds of MEC users in 5G hybrid private network, Stackelberg game is used to build the model of the competition for MEC server resources of two kinds of users, andthe strategies of complete information game and partially incomplete information game are researched respectively. It is proved that there is only one Nash equilibrium solution in the complete information scenario. In the incomplete information scenario, the environment is modeled as POMDP, and a two-stage deep reinforcement learning(TSDRL) is proposed to obtain the optimal computation offloading strategy. Simulation results show the proposed algorithm having a total reduction of 20.81% time delay and 3.38 % energy consumption compared with the D-DRL algorithm and can effectively improve the user QoE(quality of experience).

Key words: 5G hybrid private network, computation offloading, Stackelberg game theory, Nash equilibrium, partially observable Markov decision process(POMDP)

中图分类号: