Signal Timing Optimization via Reinforcement Learning with Traffic Flow Prediction

doi:10.16182/j.issn1004731x.joss.23-1416

Abstract

Abstract:

In response to the existing reinforcement learning-based traffic signal control methods that do not consider the changing trends in traffic flow, leading to congestion and inability to adapt to complex and variable road conditions, we propose a traffic signal timing optimization reinforcement learning method based on flow prediction. A phase timing amplitude control model is introduced. This model analyzes the spatiotemporal characteristics of historical traffic data to predict the flow for the next time slot and calculates a reasonable range for phase timing based on the prediction results. The H-PPO algorithm is employed to control the signal phase while simultaneously increasing its timing control. We design a pressure valve reward function to avoid frequent phase changes in controlling signals, thereby affecting the driving experience of motorists. The simulation results demonstrate that the proposed method performs well in enhancing the intersection traffic efficiency and reducing the frequency of phase switching, outperforming the comparison methods.

Key words: traffic signal control, intelligent transportation, reinforcement learning, ConvGRU

CLC Number:

TP391.9

Xu Ming, Li Jinye, Zuo Dongyu, Zhang Jing. Signal Timing Optimization via Reinforcement Learning with Traffic Flow Prediction[J]. Journal of System Simulation, 2025, 37(4): 1051-1062.

Figures/Tables 17

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Table 1

Table 2

Fig. 8

Table 3

Table 4

Table 5

Fig. 9

Fig. 10

Fig. 11

Table 6

References 20

1	Md Mokhlesur Rahman, Najaf P, Fields M G, et al. Traffic Congestion and Its Urban Scale Factors: Empirical Evidence from American Urban Areas[J]. International Journal of Sustainable Transportation, 2021, 16(5): 406-421.
2	Hunt P B, Robertson D I, Bretherton R D, et al. The SCOOT On-line Traffic Signal Optimisation Technique[J]. Traffic Engineering & Control, 1982, 23(4): 190-192.
3	Lowrie P R. Scats, Sydney Co-ordinated Adaptive Traffic System: A Traffic Responsive Method of Controlling Urban Traffic[M]. Australia: Roads and Traffic Authority NSW, 1990: 28.
4	陆丽萍, 程垦, 褚端峰, 等. 基于竞争循环双Q网络的自适应交通信号控制[J]. 中国公路学报, 2022, 35(8): 267-277.
	Lu Liping, Cheng Ken, Chu Duanfeng, et al. Adaptive Traffic Signal Control Based on Dueling Recurrent Double Q Network[J]. China Journal of Highway and Transport, 2022, 35(8): 267-277.
5	Wei Hua, Chen Chacha, Zheng Guanjie, et al. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2019: 1290-1298.
6	Varaiya P. Max Pressure Control of a Network of Signalized Intersections[J]. Transportation Research Part C: Emerging Technologies, 2013, 36: 177-195.
7	宋太龙, 贺玉龙, 刘钦. 基于深度强化学习的大型活动关键交叉口信号控制[J]. 科学技术与工程, 2023, 23(22): 9694-9701.
	Song Tailong, He Yulong, Liu Qin. Signal Control of Key Intersections in Large-scale Events Based on Deep Reinforcement Learning[J]. Science Technology and Engineering, 2023, 23(22): 9694-9701.
8	高涵, 罗娟, 蔡乾娅, 等. 一种基于异步决策的智能交通信号协调方法[J]. 计算机研究与发展, 2023, 60(12): 2797-2805.
	Gao Han, Luo Juan, Cai Qianya, et al. An Intelligent Traffic Signal Coordination Method Based on Asynchronous Decision-making[J]. Journal of Computer Research and Development, 2023, 60(12): 2797-2805.
9	唐慕尧, 周大可, 李涛. 结合状态预测的深度强化学习交通信号控制[J]. 计算机应用研究, 2022, 39(8): 2311-2315.
	Tang Muyao, Zhou Dake, Li Tao. State Prediction Based Deep Reinforcement Learning for Traffic Signal Control[J]. Application Research of Computers, 2022, 39(8): 2311-2315.
10	Chu K F, Lam A Y S, Li V O K. Traffic Signal Control Using End-to-end Off-policy Deep Reinforcement Learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 7184-7195.
11	舒凌洲, 吴佳, 王晨. 基于深度强化学习的城市交通信号控制算法[J]. 计算机应用, 2019, 39(5): 1495-1499.
	Shu Lingzhou, Wu Jia, Wang Chen. Urban Traffic Signal Control Based on Deep Reinforcement Learning[J]. Journal of Computer Applications, 2019, 39(5): 1495-1499.
12	费蓉, 刘方, 谢国, 等. 基于门控循环单元的车辆跟驰行为仿真模型[J]. 系统仿真学报, 2020, 32(10): 1862-1873.
	Fei Rong, Liu Fang, Xie Guo, et al. GRU-based Car-following Behavior Simulation Model[J]. Journal of System Simulation, 2020, 32(10): 1862-1873.
13	Siam M, Valipour S, Jagersand M, et al. Convolutional Gated Recurrent Networks for Video Segmentation[C]//2017 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2017: 3090-3094.
14	Fan Zhou, Su Rui, Zhang Weinan, et al. Hybrid Actor-critic Reinforcement Learning in Parameterized Action Space[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 2279-2285.
15	Schulman J, Moritz P, Levine S, et al. High-dimensional Continuous Control Using Generalized Advantage Estimation[EB/OL]. (2018-10-20) [2023-11-10]. .
16	Pablo Alvarez Lopez, Behrisch Michael, Bieker-Walz Laura, et al. Microscopic Traffic Simulation Using SUMO[C]//2018 21st International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE, 2018: 2575-2582.
17	Zang Xinshi, Yao Huaxiu, Zheng Guanjie, et al. MetaLight: Value-based Meta-reinforcement Learning for Traffic Signal Control[C]//Proceedings of the the Thirty-Fourth AAAI Conference on Artificial Intelligence and Thirty-Second Conference on Innovative Applications of Artificial Intelligence and Tenth Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2020: 1153-1160.
18	Cools Seung-Bae, Gershenson Carlos, D'Hooghe Bart. Self-organizing Traffic Lights: A Realistic Simulation[M]//Prokopenko M. Advances in Applied Self-Organizing Systems. London: Springer London, 2013: 45-55.
19	Hessel M, Modayil J, van Hasselt Hado, et al. Rainbow: Combining Improvements in Deep Reinforcement Learning[C]//Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2018: 3215-3222.
20	Schulman J, Wolski F, Dhariwal P, et al. Proximal Policy Optimization Algorithms[EB/OL]. (2017-08-28) [2023-11-10]. .

主要参数	量值	主要参数	量值
仿真时间/s	3 600	比例因子μ	0.7
动作网络学习率	0.000 3	批处理大小	32
价值网络学习率	0.000 2	GRU层数	3
GRU网络学习率	0.01	滑动窗口长度	4
折扣因子γ	0.95	时间片大小/min	15
λ	0.95

参数	A	B	C
每小时平均车辆数	1 695	1 936	1 857
采样时间跨度/d	30	7	7
采样地点/个	3	1	1

方法	ANS	AS/(m/s)	AVD/s	AWTV/s	ATTV/s	PF/(次/min)
本文方法相比最优/次优的提升/%	36.67	-2.71	6.50	-38.22	2.94	11.98
Fixed-time	1.36	4.05	144.51	121.98	187.76	1.67
SOTL	1.20	5.60	77.72	60.00	126.29	2.27
MaxPressure	2.57	6.45	77.42	48.61	126.22	11.40
RainbowDQN	3.16	5.48	78.78	36.44	128.11	9.53
PPO	2.64	5.87	76.59	40.99	125.16	9.30
本文	0.76	6.28	71.61	58.98	121.48	1.47

方法	ANS	AS/(m/s)	AVD/s	AWTV/s	ATTV/s	PF/(次/min)
本文方法相比最优/次优的提升/%	42.38	2.74	19.87	-14.90	13.95	16.17
Fixed-time	1.55	3.71	169.65	142.89	210.09	1.67
SOTL	1.51	5.50	97.95	75.83	145.31	3.27
MaxPressure	3.49	5.84	102.91	53.53	149.47	11.63
RainbowDQN	3.49	4.70	113.72	54.86	153.17	9.17
PPO	3.89	4.58	162.22	97.83	204.25	9.33
本文	0.87	6.00	78.49	62.90	125.04	1.40

方法	ANS	AS/(m/s)	AVD/s	AWTV/s	ATTV/s	PF/(次/min)
本文方法相比最优/次优的提升/%	31.21	-3.68	-0.93	-9.46	-0.86	8.38
Fixed-time	1.41	4.13	149.61	125.67	190.52	1.67
SOTL	1.42	5.63	90.88	70.68	136.40	3.23
MaxPressure	4.88	5.07	135.40	68.18	174.43	12.07
RainbowDQN	3.66	4.33	144.71	86.13	177.27	9.07
PPO	4.22	4.11	201.73	136.95	230.06	9.50
本文	0.97	5.43	91.73	75.30	137.59	1.53