基于改进的DeepLabv3+图像语义分割算法研究

doi:10.16182/j.issn1004731x.joss.22-0690

摘要/Abstract

摘要：

目前主流图像语义分割网络往往存在误分割、分割不连续和模型复杂度高的问题，不能灵活高效地部署于实际场景中。针对这一现象，通过综合考虑网络的参数量、预测时间和准确度，设计出一种优化DeepLabv3+模型的图像语义分割网络。骨干网络改用轻量级EfficientNetv2网络提取特征，提高参数利用率；在空洞空间金字塔池化模块中使用混合条带池化模块代替全局平均池化，引入深度可分离膨胀卷积，减少参数量和提高学习多尺度信息的能力；使用注意力机制增强模型表征力，提取骨干网络多条浅层特征，丰富图像的几何细节信息。实验表明，本文算法可达到mIoU为81.19%，参数量为55.51×10⁶，有效优化了分割精度和模型复杂度，同时也提高了模型泛化性。

关键词: DeepLabv3+, 图像语义分割, 空洞空间金字塔池化, 注意力机制, 深度可分离膨胀卷积

Abstract:

Mainstream image semantic segmentation networks currently face problems such as incorrect segmentation, discontinuous segmentation, and high model complexity, which cannot be flexibly and efficiently deployed in practical scenarios. To this end, an image semantic segmentation network that optimizes the DeepLabv3+ model is designed by comprehensively considering the network parameters, prediction time, and accuracy. The lightweight EfficientNetv2 is adopted to extract backbone network features and improve parameter utilization. In the atrous spatial pyramid pooling module, the mixed strip pooling is utilized to replace the global average pooling, and a depthwise separable dilated convolution is introduced to reduce parameters and improve the ability to learn multi-scale information. The attention mechanism is employed to enhance the model's representation power, and the multiple shallow features of the backbone network are extracted to enrich the image's geometric details. The experiment shows that the algorithm achieves 81.19% mIoU with a parameter size of 55.51×10⁶, which optimizes the segmentation accuracy and model complexity and improves model generalization.

Key words: DeepLabv3+, image semantic segmentation, atrous spatial pyramid pooling, attention mechanism, depthwise separable dilated convolution

中图分类号:

TP391

赵为平,陈雨,项松等 . 基于改进的DeepLabv3+图像语义分割算法研究[J]. 系统仿真学报, 2023, 35(11): 2333-2344.

Zhao Weiping,Chen Yu,Xiang Song,et al . Image Semantic Segmentation Algorithm Based on Improved DeepLabv3+[J]. Journal of System Simulation, 2023, 35(11): 2333-2344.

图/表 19

图1

表1

图2

图3

图4

图5

图6

图7

图8

表2

表3

表4

表5

表6

表7

图9

图10

表8

图11

参考文献 35

1	Wang Lei, Wu Jiaji, Liu Xunyu, et al. Semantic Segmentation of Large-scale Point Clouds Based on Dilated Nearest Neighbors Graph[J]. Complex & Intelligent Systems, 2022, 8(5): 3833-3845.
2	田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述[J]. 软件学报, 2019, 30(2): 440-468.
	Tian Xuan, Wang Liang, Ding Qi. Review of Image Semantic Segmentation Based on Deep Learning[J]. Journal of Software, 2019, 30(2): 440-468.
3	Asgari Taghanaki S, Abhishek K, Cohen J P, et al. Deep Semantic Segmentation of Natural and Medical Images: A Review[J]. Artificial Intelligence Review, 2021, 54(1): 137-178.
4	Yuan Xiaohui, Shi Jianfang, Gu Lichuan. A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery[J]. Expert Systems with Applications, 2021, 169: 114417.
5	王奕清. 基于计算机视觉的卫星云图反演降水量方法研究[D]. 成都: 电子科技大学, 2021.
	Wang Yiqing. A Computer Vision Method for Precipitation Inversion With Satellite Cloud Images[D]. Chengdu: University of Electronic Science and Technology of China, 2021.
6	Ivanovs M, Ozols K, Dobrajs A, et al. Improving Semantic Segmentation of Urban Scenes for Self-driving Cars with Synthetic Images[J]. Sensors, 2022, 22(6): 2252.
7	Kontschieder P, Samuel Rota Bulò, Bischof H, et al. Structured Class-labels in Random Forests for Semantic Image Labelling[C]//2011 International Conference on Computer Vision. Piscataway, NJ, USA: IEEE, 2011: 2190-2197.
8	Martijn van den Heuvel, Mandl R, Hulshoff Pol H. Normalized Cut Group Clustering of Resting-state FMRI Data[J]. PLoS One, 2008, 3(4): e2001.
9	Cherkassky V, Ma Yunqian. Practical Selection of SVM Parameters and Noise Estimation for SVM Regression[J]. Neural Networks, 2004, 17(1): 113-126.
10	Hu Yaosi, Chen Zhenzhong, Lin Weiyao. RGB-D Semantic Segmentation: A Review[C]//2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). Piscataway, NJ, USA: IEEE, 2018: 1-6.
11	Kamilaris A, Prenafeta-Boldú Francesc X. Deep Learning in Agriculture: A Survey[J]. Computers and Electronics in Agriculture, 2018, 147: 70-90.
12	刘瑞军, 王向上, 张晨, 等. 基于深度学习的视觉SLAM综述[J]. 系统仿真学报, 2020, 32(7): 1244-1256.
	Liu Ruijun, Wang Xiangshang, Zhang Chen, et al. A Survey on Visual SLAM Based on Deep Learning[J]. Journal of System Simulation, 2020, 32(7): 1244-1256.
13	罗荣, 王亮, 肖玉杰. 深度学习技术应用现状分析与发展趋势研究[J]. 计算机教育, 2019(10): 19-22.
14	Yu Changqian, Wang Jingbo, Peng Chao, et al. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation[C]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 334-349.
15	Zhang Fan, Chen Yanqin, Li Zhihang, et al. ACFNet: Attentional Class Feature Network for Semantic Segmentation[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2019: 6797-6806.
16	Wu Tianyi, Tang Sheng, Zhang Rui, et al. CGNet: A Light-weight Context Guided Network for Semantic Segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 1169-1179.
17	Zhao Yaochi, Liu Shiguang, Hu Zhuhua. Focal Learning on Stranger for Imbalanced Image Segmentation[J]. IET Image Processing, 2022, 16(5): 1305-1323.
18	Zhao Yaochi, Liu Shiguang, Hu Zhuhua. Dynamically Balancing Class Losses in Imbalanced Deep Learning[J]. Electronics Letters, 2022, 58(5): 203-206.
19	Long J, Shelhamer E, Darrell T. Fully Convolutional Networks for Semantic Segmentation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2015: 3431-3440.
20	Guo Yanming, Liu Yu, Georgiou T, et al. A Review of Semantic Segmentation Using Deep Neural Networks[J]. International Journal of Multimedia Information Retrieval, 2018, 7(2): 87-93.
21	Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-decoder Architecture for Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
22	Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]//Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham: Springer International Publishing, 2015: 234-241.
23	Schönfeld Edgar, Schiele B, Khoreva A. A U-net Based Discriminator for Generative Adversarial Networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2020: 8204-8213.
24	Jaeger P F, Kohl S A A, Bickelhaupt S, et al. Retina U-net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection[C]//Proceedings of the Machine Learning for Health NeurIPS Workshop. Chia Laguna Resort, Sardinia, Italy: PMLR, 2020: 171-183.
25	Chen L C, Papandreou G, Kokkinos I, et al. Semantic Image Segmentation With Deep Convolutional Nets and Fully Connected CRFs[EB/OL]. (2016-06-07) [2022-05-30]. .
26	Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-scale Image Recognition[EB/OL]. (2015-04-10) [2022-05-30]. .
27	Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848.
28	Chen L C, Papandreou G, Schroff F, et al. Rethinking Atrous Convolution for Semantic Image Segmentation[EB/OL]. (2017-12-05) [2022-05-30]. .
29	Chen L C, Zhu Yukun, Papandreou G, et al. Encoder-decoder With Atrous Separable Convolution for Semantic Image Segmentation[C]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 833-851.
30	Tan Mingxing, Le Q. EfficientNetV2: Smaller Models and Faster Training[C]//Proceedings of the 38th International Conference on Machine Learning. Chia Laguna Resort, Sardinia, Italy: PMLR, 2021: 10096-10106.
31	Tan Mingxing, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[C]//Proceedings of the 36th International Conference on Machine Learning. Chia Laguna Resort, Sardinia, Italy: PMLR, 2019: 6105-6114.
32	Hou Qibin, Zhang Li, Cheng Mingming, et al. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2020: 4002-4011.
33	Liu Yichao, Shao Zongru, Teng Yueyang, et al. NAM: Normalization-based Attention Module[EB/OL]. (2021-11-24) [2022-05-30]. .
34	Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2018: 7132-7141.
35	Woo S, Park J, Lee J Y, et al. CBAM: Convolutional Block Attention Module[C]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.

网络结构形式	图像尺寸	通道数	层数
3×3Conv	224×224	24	1
3×3Fused-MBConv1	112×112	24	2
3×3Fused-MBConv4	112×112	48	4
3×3Fused-MBConv4	56×56	64	4
3×3MBConv4	28×28	128	6
3×3MBConv6	14×14	160	9
3×3MBConv6	14×14	272	15
Conv2D&Pooling&FC	7×7	1 792	1

实验	骨干网络	mIoU/%	参数量/M
1	MobileNetv2	76.90	5.13
2	ResNet101	80.23	56.85
3	Xception	79.71	71.30
4	SwinTransformer	83.68	92.93
5	EfficientNetv2	81.19	55.51

组别	GAP	MSPM	DSDConv	mIoU/%	SPPT/ms
1	√			78.64	59.84
2	√	√		79.85	65.10
3		√		79.99	51.63
4		√	√	79.74	43.46

实验	Backbone	Attention	mIoU/%	FPS/(frame/s)
1	EfficientNetv2	ECA	79.58	16.71
2	EfficientNetv2	SE	79.43	15.79
3	EfficientNetv2	CBAM	80.20	14.32
4	EfficientNetv2	NAM	80.25	16.63

组别	EfficientNetv2	N-ASPP	NAM	SFF	mIoU/%	SPPT/ms
1	√				78.94	59.84
2	√	√			79.74	43.46
3	√		√		80.25	60.13
4	√	√	√		80.80	44.40
5	√	√	√	√	81.19	44.92