基于自适应融合和注意力细化的语义分割模型

doi:10.16182/j.issn1004731x.joss.22-0169

摘要/Abstract

摘要：

针对现有语义分割中存在的上下文信息利用不足和细节信息丢失等问题，提出了一种基于自适应融合和注意力细化的语义分割模型。该模型在编码的过程中引入一个自适应融合模块，通过让每个特征图按照相应的权重进行融合的方式来解决上下文信息利用不足的问题。在解码的过程中设计了一个注意力细化模块，使低阶特征与高阶特征之间能够进行相互指导优化，从而解决细节信息丢失的问题。实验结果表明：该模型在PASCAL VOC 2012数据集上的平均交并比达到了83.7%，比基于编解码的语义分割模型提高了1.1%；在Cityscapes数据集上取得了81.7%的平均交并比，进一步验证了该模型的泛化性。

关键词: 语义分割, 金字塔池化, 注意力机制, 自适应融合, 编码-解码架构

Abstract:

Aiming at the insufficient use of context information and loss of detail information of the existing semantic segmentation, a model based on adaptive fusion and attention refinement is proposed. The model introduces an adaptive fusion module in the process of coding, and solves the insufficient use of context information by fusing each feature map according to the corresponding weight. An attention thinning module is designed in the process of decoding, so that the low-order features and high-order features can guide and optimize each other to solve the loss of detail information. The experimental results show that the average intersection union ratio of the model on PASCAL VOC 2012 dataset reaches 83.7%, which is 1.1% higher than the semantic segmentation model based on encoding and decoding. The average intersection union ratio of 81.7% is obtained on cityscapes dataset, which further verifies the generalization of the model.

Key words: semantic segmentation, pyramid pooling, attention mechanism, adaptive fusion, encoding-decoding architecture

中图分类号:

TP391

魏赟, 罗琦, 赵迎志. 基于自适应融合和注意力细化的语义分割模型[J]. 系统仿真学报, 2023, 35(6): 1226-1234.

Yun Wei, Qi Luo, Yingzhi Zhao. Semantic Segmentation Model Based on Adaptive Fusion and Attention Refinement[J]. Journal of System Simulation, 2023, 35(6): 1226-1234.

图/表 11

图 1

图2

图3

图4

图5

表 1

表 2

表3

图 6

表4

表5

参考文献 24

1	何淼楹, 崔宇超. 面向自动驾驶的交通场景语义分割[J]. 计算机应用, 2021, 41(增1): 25-30.
	He Miaoying, Cui Yuchao. Automatic Driving Oriented Traffic Scene Semantic Segmentation[J]. Computer Application, 2021, 41(S1): 25-30.
2	邓泓, 杨滢婷, 刘兆朋. 基于深度学习的无人机水田图像语义分割方法[J]. 中国农机化学报, 2021, 42(10): 165-172.
	Deng Hong, Yang Yingting, Liu Zhaopeng. Semantic Segmentation Method of UAV Paddy Field Image Based on Deep Learning[J]. China Agricultural Machinery Chemical News, 2021, 42(10): 165-172.
3	Long J, Shelhamer E, Darrell T. Fully Convolutional Networks for Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway NJ: IEEE, 2015: 3431-3440.
4	Zhao H, Shi J, Qi X, et al. Pyramid Scene Parsing Network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway NJ: IEEE, 2017: 2881-2890.
5	Chen L, Zhu Y, Papandreou G, et al. Encoder-decoder with Atrous Separable Convolution for Semantic Image Segmentation[C]//Computer Vision-ECCV 2018. Berlin German: Springer International Publishing, 2018: 801-818.
6	Yang M, Yu K, Chi Z, et al. Dense ASPP for Semantic Segmentation in Street Scenes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway NJ: IEEE, 2018: 3684-3692.
7	Tang Q, Liu F, Jiang J, et al. Attention-guided Chained Context Aggregation for Semantic Segmentation[J]. Image and Vision Computing, 2021, 115: 104309.
8	Ronneberger O, Fischer P, Brox T, et al. U-Net: Convolutional Networks for Biomedical Image Segmentation[C]//International Conference on Medical Image Computing and Computer-assisted Intervention. Berlin: Springer International Publishing, 2015: 234-241.
9	Zhou Q, Wu X, Zhang S, et al. Contextual Ensemble Network for Semantic Segmentation[J]. Pattern Recognition, 2022, 122: 108290.
10	Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-decoder Architecture for Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495.
11	Yu C, Wang J, Peng C, et al. Learning a Discriminative Feature Network for Semantic Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway NJ: IEEE, 2018: 1857-1866.
12	Jie H, Li S, Gang S, et al. Squeeze-and-Excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway NJ: IEEE, 2018: 7132-7141.
13	Fu J, Liu J, Tian H, et al. Dual Attention Network for Scene Segmentation[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway NJ: IEEE, 2019: 3146-3154.
14	Huang Z, Wang X, Huang L, et al. CCNet: Criss-cross Attention for Semantic Segmentation[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway NJ: IEEE, 2019: 603-612.
15	Zhou Zhen, Zhou Yan, Wang Dongli, et al. Self-attention Feature Fusion Network for Semantic Segmentation[J]. Neurocomputing, 2021, 453: 50-59.
16	Liu M, Yin H. Efficient Pyramid Context Encoding and Feature Embedding for Semantic Segmentation[J]. Image and Vision Computing, 2021, 111: 104195.
17	Liu S, Qi L, Qin H, et al. Path Aggregation Network for Instance Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway NJ: IEEE, 2018: 8759-8768.
18	Singha T, Pham D, Krishna A. FANet: Feature Aggregation Network for Semantic Segmentation[C]//2020 Digital Image Computing: Techniques and Applications(DICTA). Canberra, Australia: IEEE, 2020: 1-8.
19	Wang Z, Wang J, Yang k, et al. Semantic Segmentation of High-resolution Remote Sensing Images Based on a Class Feature Attention Mechanism Fused with Deeplabv3+[J]. Computers&Geosciences, 2022, 158: 104969.
20	He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway NJ: IEEE, 2016: 770-778.
21	Zhao H, Zhang Y, Liu S, et al. PSANet: Point-wise Spatial Attention Network for Scene Parsing[C]//Computer Vision-ECCV 2018. Berlin German: Springer International Publishing, 2018: 267-283.
22	Zhen M, Wang J, Zhou L, et al. Joint Semantic Segmentation and Boundary Detection Using Iterative Pyramid Contexts[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway NJ: IEEE, 2020: 13666-13675.
23	Ma N, Zhang X, Zheng H, et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design[C]//Computer Vision-ECCV 2018. Berlin German: Springer International Publishing, 2018: 116-131.
24	Howard A, Sandler M, Chu G, et al. Searching for MobileNetV3[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway NJ: IEEE, 2019: 1314-1324.

模型	主干网络	mIoU/%	参数量/M
FCN^[3]	ResNet-101	62.2	134.5
SA-FFNet^[15]	ResNet-101	76.4	53.7
Deeplabv3+^[5]	ResNet-101	82.1	68.6
PSPNet^[4]	ResNet-101	82.6	65.7
DANet^[13]	ResNet-101	82.6	66.5
FARNet	ResNet-101	83.7	70.1

模型	主干网络	mIoU/%
FCN	ResNet-101	65.3
PSPNet	ResNet-101	78.4
PSANet^[21]	ResNet-101	80.1
DeeplabV3+	ResNet-101	80.5
DANet	ResNet-101	81.1
RPCNet^[22]	ResNet-101	81.8
FARNet	ResNet-101	81.7

模型	飞机	单车	鸟	船	瓶子	公交车	椅子	牛	餐桌	狗	马	羊	沙发	火车	电视
FCN	76.8	32.4	68.9	49.4	60.3	75.3	21.4	62.5	46.8	71.8	63.9	72.4	37.4	70.9	55.1
SA-FFNet	90.4	67.2	88.7	74.5	72.3	85.5	29.6	88.1	61.0	84.0	80.1	81.3	45.2	82.3	75.6
DeeplabrV3+	90.2	72.8	94.1	70.6	76.1	90.6	36.2	91.1	69.7	91.8	90.0	88.3	60.9	84.6	75.5
PSPNet	91.8	71.9	94.7	71.2	75.8	95.2	39.3	90.7	71.7	90.5	94.5	89.6	64.0	85.1	76.3
DANet	90.3	74.6	93.9	73.4	74.3	95.4	38.7	89.1	73.3	89.5	93.3	89.3	62.3	87.2	76.5
FARNet	92.6	74.8	95.6	80.8	77.4	95.6	43.7	86.5	70.3	94.3	92.2	90.0	70.3	88.1	79.2

自适应融合模块	mIoU/%
×	76.3
√	78.9

解码器	通道注意力机制	空间注意力机制	mIoU/%
基线	×	×	78.3
融合	×	×	78.1
融合	×	√	80.4
融合	√	×	79.3
融合	√	√	81.1