系统仿真学报 ›› 2024, Vol. 36 ›› Issue (11): 2503-2516.doi: 10.16182/j.issn1004731x.joss.23-0906

• 专栏:基于视觉的智能驾驶环境感知 •    

基于YOLOX-Tiny的长尾分布交通标志识别模型

伍云鹏1, 付应雄1, 沈丽君2, 崔峰3   

  1. 1.湖北大学,湖北 武汉 430000
    2.中国科学院自动化研究所,北京 100190
    3.北京中科慧眼科技有限公司,北京 100190
  • 收稿日期:2023-07-18 修回日期:2023-11-20 出版日期:2024-11-13 发布日期:2024-11-19
  • 通讯作者: 崔峰
  • 第一作者简介:伍云鹏(1999-),男,硕士生,研究方向为机器学习与图像处理。
  • 基金资助:
    国家重点研发计划(2018AAA0103103);国家自然科学基金(32171461)

Traffic Sign Recognition Model with Long-Tail Distribution Based on YOLOX-Tiny

Wu Yunpeng1, Fu Yingxiong1, Shen Lijun2, Cui Feng3   

  1. 1.Hubei University, Wuhan 430000, China
    2.Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    3.Beijing Smarter Eye Technology Company, Beijing 100190, China
  • Received:2023-07-18 Revised:2023-11-20 Online:2024-11-13 Published:2024-11-19
  • Contact: Cui Feng

摘要:

在智能驾驶领域,准确识别交通标志对行车安全具有重要意义,交通标志训练集往往服从长尾分布,这为交通标志识别带来极大难度。针对于长尾分布数据集训练出的模型在尾类上表现差的现象,提出一种基于YOLOX-Tiny的长尾分布交通标志识别模型。在TT100K_2021 (tsinghua-tencent 100K 2021)数据集基础上制作交通标志长尾数据集;从制作数据集图片数量、样本分布以及模型大小出发,选择YOLOX-Tiny作为基础模型;采用EQL v2(equalization loss v2)和FL(focal loss)作为分类损失和目标置信度损失,平衡分类器头尾差距,增强模型对目标置信度的预测;在颈部双向金字塔中引入上采样算子CARAFE、坐标注意力机制(coordinate attention,CA)和CAR-ASFF模块(CARAFE + adaptively spatial feature fusion),解决传统特征金字塔上不同层级特征图的反向传播冲突问题,提升特征重组效果,突出目标特征。研究结果表明:改进的YOLOX-Tiny模型在制作的长尾交通标志数据集上mAP50mAP50:95分别达到了43.67%和29.98%,改进模型相比较其他几种目标检测模型具有更高的检测精度。

关键词: 长尾分布, YOLOX, 交通标志识别, 注意力机制, 特征重组, 多尺度特征融合

Abstract:

Accurate recognition of traffic signs plays an important role in the field of intelligent driving. Traffic sign training datasets with long-tail distribution increase the difficulty of traffic sign recognition. A traffic sign recognition model with long-tail distribution based on YOLOX-Tiny was proposed to improve the poor performance of the model trained on long-tail distribution datasets. A long-tail traffic sign dataset was created based on the TT100K_2021 (tsinghua-tencent 100K 2021) dataset. YOLOX-Tiny was chosen as the underlying model by considering picture numbers in datasets, sample distribution, and model size. Equalization loss v2 (EQL v2) was used as classification loss to balance the head and tail of the classifier, and focal loss(FL) was used as target confidence loss to enhance the model's prediction of target confidence. In order to solvethe backpropagation conflicts of feature graphs at different levels on the traditional feature pyramid, enhance the featurereorganization effect, and highlight target feature, up-sampling operator CARAFE, coordinate attention (CA), and CARAFE + adaptively spatial feature fusion modules (CAR-ASFF) were introduced to the neck bidirectional pyramid. The research results show that the improved YOLOX-Tiny model achieves 43.67% and 29.98% respectively in the long-tail traffic sign datasets, namely mAP50 and mAP50:95. The improved model has higher detection accuracy than other target detection models.

Key words: long-tail distribution, YOLOX, traffic sign recognition, attention mechanism, feature reorganization, multiscale feature fusion

中图分类号: