基于MLP与改进GCN-TD3的交通信号控制建模与仿真

doi:10.16182/j.issn1004731x.joss.24-0523

摘要/Abstract

摘要：

针对城市交叉口车流量不均、道路容量有限以及现有交通信号控制算法协同性较差问题，提出一种基于图卷积强化学习的交通信号控制算法。利用多层感知器提取被控路口与邻近路口的车辆及相位信息的动态特征，采用图卷积神经网络将车辆动态特征聚合为区域交通的潜在特征，由改进的双延迟深度确定性策略梯度算法进行多次迭代得到控制策略，将控制策略应用于城市路网的交通相位配时中，最大化的提升路网车辆的通行效率。仿真实验表明：该算法能够适应动态变化的复杂路网环境，且在高饱和流量下控制效果明显，能有效提高路网的通行效率，缓解交叉口高峰期拥堵问题。

关键词: 交通信号控制, 图卷积神经网络, 强化学习, 双延迟深度确定性策略梯度, 协同控制

Abstract:

To address the issues of uneven traffic flow at urban intersections, limited road capacity, and the poor coordination of existing traffic signal control algorithms, a traffic signal control algorithm based on graph convolutional reinforcement learning was proposed. By utilizing a multilayer perceptron, the dynamic features of vehicles and phase information at the controlled intersection and its neighboring intersections were extracted. A graph convolutional neural network was then employed to aggregate these vehicle dynamic features into potential features representing regional traffic. The control strategy was derived through multiple iterations of an improved twin delayed deep deterministic policy gradient (TD3) algorithm. This control strategy was applied to the traffic phase timing of the urban road network, aiming to maximize the traffic efficiency of the road network. Simulation experiments demonstrate that the algorithm can adapt to dynamically changing and complex road network environments. Moreover, the control effect is significant under high saturation flow, effectively improving the traffic efficiency of the road network and alleviating congestion at intersections during peak hours.

Key words: traffic signal control, graph convolutional neural network, reinforcement learning, twin delayed deep deterministic policy gradient (TD3), coordinated control

中图分类号:

TP391.9

黄德启,涂亚婷,张振华等 . 基于MLP与改进GCN-TD3的交通信号控制建模与仿真[J]. 系统仿真学报, 2025, 37(10): 2568-2577.

Huang Deqi,Tu Yating,Zhang Zhenhua,et al . Modeling and Simulation of Traffic Signal Control Based on MLP with Improved GCN-TD3[J]. Journal of System Simulation, 2025, 37(10): 2568-2577.

图/表 14

图1

图2

图3

图4

图5

图6

图7

表1

TD3训练超参数设置

参数名称	值
回合episode	100
经验池容量大小k	20 480
训练数据大小minibatch_size	512
学习率 $α$	0.001
折扣因子 $γ$	0.85
探索噪声方差exploration_noise_std	0.5
策略噪声方差policy_noise_std	0.2
延迟更新频率delay_update_frequency	5

表1

图8

图9

表2

表3

表4

表5

参考文献 23

[1]	孙浩, 陈春林, 刘琼, 等. 基于深度强化学习的交通信号控制方法[J]. 计算机科学, 2020, 47(2): 169-174.
	Sun Hao, Chen Chunlin, Liu Qiong, et al. Traffic Signal Control Method Based on Deep Reinforcement Learning[J]. Computer Science, 2020, 47(2): 169-174.
[2]	Bi Yunrui, Sun Zhe, Lu Xiaobo, et al. Adaptive Type-2 Fuzzy Traffic Signal Control with On-line Optimization[J]. Journal of Intelligent & Fuzzy Systems, 2018, 35(2): 1889-1904.
[3]	张泰文, 张存保, 罗舒琳, 等. 面向常发拥堵点的交通信号协调控制方法[J]. 交通信息与安全, 2021, 39(6): 63-72.
	Zhang Taiwen, Zhang Cunbao, Luo Shulin, et al. A Coordinated Control Method of Traffic Signals for Recurrent Congested Network Locations[J]. Journal of Transport Information and Safety, 2021, 39(6): 63-72.
[4]	夏新海. 城市交通信号局部博弈交互下的学习协调控制[J]. 计算机工程与应用, 2020, 56(23): 245-252.
	Xia Xinhai. Urban Traffic Signal Coordinated Control Based on Learning with Local Game Interaction[J]. Computer Engineering and Applications, 2020, 56(23): 245-252.
[5]	徐建闽, 周湘鹏, 首艳芳. 基于深度强化学习的自适应交通信号控制研究[J]. 重庆交通大学学报(自然科学版), 2022, 41(8): 24-29.
	Xu Jianmin, Zhou Xiangpeng, Shou Yanfang. Adaptive Traffic Signal Control Based on Deep Reinforcement Learning[J]. Journal of Chongqing Jiaotong University(Natural Science), 2022, 41(8): 24-29.
[6]	于泽, 宁念文, 郑燕柳, 等. 深度强化学习驱动的智能交通信号控制策略综述[J]. 计算机科学, 2023, 50(4): 159-171.
	Yu Ze, Ning Nianwen, Zheng Yanliu, et al. Review of Intelligent Traffic Signal Control Strategies Driven by Deep Reinforcement Learning[J]. Computer Science, 2023, 50(4): 159-171.
[7]	许明, 李金烨, 左东宇, 等. 基于流量预测的信号灯配时优化强化学习方法[J]. 系统仿真学报, 2025, 37(4): 1051-1062.
	Xu Ming, Li Jinye, Zuo Dongyu, et al. Signal Timing Optimization via Reinforcement Learning with Traffic Flow Prediction[J]. Journal of System Simulation, 2025, 37(4): 1051-1062.
[8]	Liang Xiaoyuan, Du Xunsheng, Wang Guiling, et al. A Deep Reinforcement Learning Network for Traffic Light Cycle Control[J]. IEEE Transactions on Vehicular Technology, 2019, 68(2): 1243-1253.
[9]	Tan Tian, Bao Feng, Deng Yue, et al. Cooperative Deep Reinforcement Learning for Large-scale Traffic Grid Signal Control[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2687-2700.
[10]	吴昊昇, 郑皎凌, 王茂帆. TR-light:基于多信号灯强化学习的交通组织方案优化算法[J]. 计算机应用研究, 2022, 39(2): 504-509, 514.
	Wu Haosheng, Zheng Jiaoling, Wang Maofan. TR-light: Traffic Organization Plan Optimization Algorithm Based on Multiple Traffic Signal Lights Reinforcement Learning[J]. Application Research of Computers, 2022, 39(2): 504-509, 514.
[11]	任安妮, 周大可, 冯锦浩, 等. 基于注意力机制的深度强化学习交通信号控制[J]. 计算机应用研究, 2023, 40(2): 430-434.
	Ren Anni, Zhou Dake, Feng Jinhao, et al. Attention Mechanism Based Deep Reinforcement Learning for Traffic Signal Control[J]. Application Research of Computers, 2023, 40(2): 430-434.
[12]	陆丽萍, 程垦, 褚端峰, 等. 基于竞争循环双Q网络的自适应交通信号控制[J]. 中国公路学报, 2022, 35(8): 267-277.
	Lu Liping, Cheng Ken, Chu Duanfeng, et al. Adaptive Traffic Signal Control Based on Dueling Recurrent Double Q Network[J]. China Journal of Highway and Transport, 2022, 35(8): 267-277.
[13]	陈喜群, 朱奕璋, 吕朝锋. 基于混合近端策略优化的交叉口信号相位与配时优化方法[J]. 交通运输系统工程与信息, 2023, 23(1): 106-113.
	Chen Xiqun, Zhu Yizhang, Chaofeng Lü. Signal Phase and Timing Optimization Method for Intersection Based on Hybrid Proximal Policy Optimization[J]. Journal of Transportation Systems Engineering and Information Technology, 2023, 23(1): 106-113.
[14]	刘志, 曹诗鹏, 沈阳, 等. 基于改进深度强化学习方法的单交叉口信号控制[J]. 计算机科学, 2020, 47(12): 226-232.
	Liu Zhi, Cao Shipeng, Shen Yang, et al. Signal Control of Single Intersection Based on Improved Deep Reinforcement Learning Method[J]. Computer Science, 2020, 47(12): 226-232.
[15]	Long Guangcheng, Wang Anlin, Jiang Tao. Traffic Signal Self-organizing Control with Road Capacity Constraints[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(10): 18502-18511.
[16]	Chen Youqing, Zhang Huizhen, Liu Minglei, et al. Traffic Signal Optimization Control Method Based on Adaptive Weighted Averaged Double Deep Q Network[J]. Applied Intelligence, 2023, 53(15): 18333-18354.
[17]	Huang Liben, Qu Xiaohui. Improving Traffic Signal Control Operations Using Proximal Policy Optimization[J]. IET Intelligent Transport Systems, 2023, 17(3): 592-605.
[18]	Li Lulu, Zhu Ruijie, Wu Shuning, et al. Adaptive Multi-agent Deep Mixed Reinforcement Learning for Traffic Light Control[J]. IEEE Transactions on Vehicular Technology, 2024, 73(2): 1803-1816.
[19]	Yan Liping, Zhu Lulong, Song Kai, et al. Graph Cooperation Deep Reinforcement Learning for Ecological Urban Traffic Signal Control[J]. Applied Intelligence, 2023, 53(6): 6248-6265.
[20]	Devailly François-Xavier, Larocque Denis, Charlin Laurent. IG-RL: Inductive Graph Reinforcement Learning for Massive-scale Traffic Signal Control[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 7496-7507.
[21]	温广辉, 杨涛, 周佳玲, 等. 强化学习与自适应动态规划:从基础理论到多智能体系统中的应用进展综述[J]. 控制与决策, 2023, 38(5): 1200-1230.
	Wen Guanghui, Yang Tao, Zhou Jialing, et al. Reinforcement Learning and Adaptive/Approximate Dynamic Programming: A Survey from Theory to Applications in Multi-agent Systems[J]. Control and Decision, 2023, 38(5): 1200-1230.
[22]	刘智敏, 叶宝林, 朱耀东, 等. 基于深度强化学习的交通信号控制方法[J]. 浙江大学学报(工学版), 2022, 56(6): 1249-1256.
	Liu Zhimin, Ye Baolin, Zhu Yaodong, et al. A Traffic Signal Control Method Based on Deep Reinforcement Learning[J]. Journal of Zhejiang University(Engineering Science), 2022, 56(6): 1249-1256.
[23]	Ma Dongfang, Zhou Bin, Song Xiang, et al. A Deep Reinforcement Learning Approach to Traffic Signal Control with Temporal Traffic Pattern Mining[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11789-11800.

车流量情况	定时控制	经验控制	TD3	GCN-TD3
低峰	82.3	81.2	81.6	77.1
平峰	167.9	145.2	128.2	121.6
高峰	296.9	249.69	183.4	155.7

车流量情况	定时控制	经验控制	TD3	GCN-TD3
低峰	10.73	7.98	5.69	3.14
平峰	77.19	27.42	6.75	4.27
高峰	108.47	105.65	18.13	11.93

控制策略	奇台路与黄河路	奇台路与和田一街
定时控制	31.18	6.22
经验控制	15.52	4.62
GCN-TD3	14.85	3.22

控制策略	奇台路与黄河路	奇台路与和田一街
定时控制	121.71	5.81
经验控制	11.63	5.62
GCN-TD3	8.25	1.71