联合自注意力机制与权值共享的人体行为识别模型

doi:10.16182/j.issn1004731x.joss.24-0362

摘要/Abstract

摘要：

随着可穿戴设备的普及，基于可穿戴传感器的人体行为识别已被广泛关注。如何从原始传感器数据中提取较为有效的行为信息并组成相应特征向量，是该领域的核心问题。目前，卷积和循环神经网络已广泛用于多传感器数据的特征提取，然而这些网络难以站在全局角度关注到人体行为沿时间维度具有的重要特征。为此，在考虑到布设于人体不同部位传感器存在的逻辑相关性后，提出了一个基于自注意力机制与权值共享的多分支人体行为识别模型(Multi-CNN-BiLSTM-self attention，Multi-CBSA)。该模型使用架构统一且权值一致的子网络提取人体不同部位行为数据的特征，从而简化模型结构并减少模型训练参数。同时，每个子网络利用一维卷积将原始行为数据转换为由高级特征组成的短序列，通过双向长短期记忆网络获取短序列的前后向时序特征，利用自注意力机制将提取到的行为特征进行动态权值分配，获取具有代表性的关键特征，每个子网络输出将在融合层进行特征融合。消融实验表明：在引入自注意力机制后，Multi-CBSA在收敛速度、验证集损失、以及单类行为识别准确率上都有提升。对比实验表明，Multi-CBSA可以在减少训练参数量的基础上将MHEALTH和PAMAP2数据集的识别准确率提高到99.3%和96.4%，相较于近年表现较好的模型，识别准确率最大可以提高4.2%和4.4%。

关键词: 人体行为识别, 可穿戴传感器, 特征提取, 自注意力机制, 权值共享

Abstract:

With the prevalence of wearable devices, human activity recognition based on wearable sensor data has garnered significant attention. The central issue in this field is how to extract effective behavioral information from raw sensor data to form corresponding feature vectors. Currently, convolutional neural networks and recurrent neural networks have been widely utilized for feature extraction from multi-sensor data. However, these networks struggle to globally capture the crucial temporal features inherent of human activity over time. To address this, a multi-CNN-BiLSTM-self attention (Multi-CBSA) model based on self-attention and weight sharing has been proposed, taking into consideration the logical correlations among sensors placed on different parts of the body. This model employs uniformly structured and weight-shared sub-networks to extract features from activity data captured by different body parts, simplifying the model architecture and reducing training parameters. In this model, 1-dimensional convolutional neural network is used to convert the original behavioral data into short sequences consisting of advanced features; second, the forward and backward temporal features of the short sequences are obtained by bi-directional long and short-term memory network for each sub-network; and third, representative key features are obtained utilizing the self-attention by assigning dynamic weights to human features; The outputs from each sub-network are fused in a fusion layer. Ablation experiments demonstrate that Multi-CBSA has significant improvements in convergence speed, validation set loss, and single-class activity recognition accuracy after the introduction of self-attention. Comparative experiments show that Multi-CBSA can achieve recognition accuracies of 99.3% and 96.4% on the MHEALTH and PAMAP2 datasets, respectively, with fewer training parameters. Compared to recent state-of-the-art models, the recognition accuracy can be increased by up to 4.2% and 4.4%.

Key words: human activity recognition, wearable sensor, feature extraction, self-attention, weight sharing

中图分类号:

TP391

马仑,杨跃,王迨贺等 . 联合自注意力机制与权值共享的人体行为识别模型[J]. 系统仿真学报, 2025, 37(9): 2409-2419.

Ma Lun,Yang Yue,Wang Daihe,et al . A Model Combining Self-attention and Weight Sharing for Human Activity Recognition[J]. Journal of System Simulation, 2025, 37(9): 2409-2419.

图/表 16

图1

图2

表1

表2

表3

表4

图3

图4

图5

图6

图7

图8

图9

图10

表5

表6

参考文献 22

[1]	Minh Dang L, Min Kyungbok, Wang Hanxiang, et al. Sensor-based and Vision-based Human Activity Recognition: A Comprehensive Survey[J]. Pattern Recognition, 2020, 108: 107561.
[2]	Zhang Shibo, Li Yaxuan, Zhang Shen, et al. Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances[J]. Sensors, 2022, 22(4): 1476.
[3]	Mohamed Raihani, Mohammad Noorazlan Shah Zainudin, Md Nasir Sulaiman, et al. Multi-label Classification for Physical Activity Recognition from Various Accelerometer Sensor Positions[J]. Journal of Information and Communication Technology, 2018, 17(2): 209-231.
[4]	Francisco Javier Ordóñez, Roggen D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition[J]. Sensors, 2016, 16(1): 115.
[5]	Lai Guokun, Chang Weicheng, Yang Yiming, et al. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: Association for Computing Machinery, 2018: 95-104.
[6]	Kastner Sabine, Ungerleider Leslie G. Mechanisms of Visual Attention in the Human Cortex[J]. Annual Review of Neuroscience, 2000, 23: 315-341.
[7]	Zhang Haoxi, Xiao Zhiwen, Wang Juan, et al. A Novel IoT-Perceptive Human Activity Recognition (HAR) Approach Using Multihead Convolutional Attention[J]. IEEE Internet of Things Journal, 2020, 7(2): 1072-1080.
[8]	Khan Zanobya N, Ahmad Jamil. Attention Induced Multi-head Convolutional Neural Network for Human Activity Recognition[J]. Applied Soft Computing, 2021, 110: 107671.
[9]	Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. : Curran Associates Inc., 2017: 6000-6010.
[10]	Mst Alema Khatun, Mohammad Abu Yousuf, Ahmed Sabbir, et al. Deep CNN-LSTM with Self-attention Model for Human Activity Recognition Using Wearable Sensor[J]. IEEE Journal of Translational Engineering in Health and Medicine, 2022, 10: 1-16.
[11]	Singh Satya P, Madan Kumar Sharma, Lay-Ekuakille Aimé, et al. Deep ConvLSTM with Self-attention for Human Activity Decoding Using Wearable Sensors[J]. IEEE Sensors Journal, 2021, 21(6): 8575-8582.
[12]	Bromley J, Guyon I, LeCun Y, et al. Signature Verification Using a "Siamese" Time Delay Neural Network[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. San Francisco: Morgan Kaufmann Publishers Inc., 1993: 737-744.
[13]	Banos Oresti, Garcia Rafael, Holgado-Terriza Juan A, et al. mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications[C]//Ambient Assisted Living and Daily Activities. Cham: Springer International Publishing, 2014: 91-98.
[14]	Reiss Attila, Stricker Didier. Introducing a New Benchmarked Dataset for Activity Monitoring[C]//2012 16th International Symposium on Wearable Computers. Piscataway: IEEE, 2012: 108-109.
[15]	Si Chenyang, Jing Ya, Wang Wei, et al. Skeleton-based Action Recognition with Spatial Reasoning and Temporal Stack Learning[C]//Computer Vision – ECCV 2018. Cham: Springer International Publishing, 2018: 106-121.
[16]	马仑, 刘鑫, 赵斌, 等. 利用多头-连体神经网络实现障碍行为识别[J]. 西安电子科技大学学报, 2022, 49(4): 100-108, 175.
	Ma Lun, Liu Xin, Zhao Bin, et al. Impaired Behavior Recognition by Using the Multi-head-siamese Neural Network[J]. Journal of Xidian University, 2022, 49(4): 100-108, 175.
[17]	Snoek J, Larochelle H, Adams R P. Practical Bayesian Optimization of Machine Learning Algorithms[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 2951-2959.
[18]	Aljarrah Amir A, Ali Ali H. Human Activity Recognition using PCA and BiLSTM Recurrent Neural Networks[C]//2019 2nd International Conference on Engineering Technology and its Applications (IICETA). Piscataway: IEEE, 2019: 156-160.
[19]	Ihianle I K, Nwajana A O, Ebenuwa S H, et al. A Deep Learning Approach for Human Activities Recognition from Multimodal Sensing Devices[J]. IEEE Access, 2020, 8: 179028-179038.
[20]	Sravan Kumar Challa, Kumar Akhilesh, Vijay Bhaskar Semwal. A Multibranch CNN-BiLSTM Model for Human Activity Recognition Using Wearable Sensor Data[J]. The Visual Computer, 2022, 38(12): 4095-4109.
[21]	Dua Nidhi, Shiva Nand Singh, Vijay Bhaskar Semwal. Multi-input CNN-GRU Based Human Activity Recognition Using Wearable Sensors[J]. Computing, 2021, 103(7): 1461-1478.
[22]	Mahmud Saif, Tanjid Hasan Tonmoy M, Kishor Kumar Bhaumik, et al. Human Activity Recognition from Wearable Sensor Data Using Self-attention[M]//Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, et al. Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press, 2020: 1332-1339.

标签	MHEALTH		PAMAP2
标签	行为	持续时间	行为	持续时间/min
1	静止	1 min	躺	3
2	坐	1 min	坐	3
3	躺下	1 min	站	3
4	步行	1 min	走	3
5	爬楼梯	1 min	跑	3
6	弯腰	20次	骑自行车	3
7	手臂前仰	20次	北欧步行	3
8	蹲下	20次	上楼梯	2
9	骑行	1 min	下楼梯	2
10	慢跑	1 min	熨衣服	3
11	跑步	1 min	使用吸尘器	3
12	前后跳跃	20次	跳绳	2

超参数	取值
卷积核1数目	64
卷积核2数目	64
卷积核长度	7
最大池化层	2
双向LSTM层的输出维度	128
Dropout	0.35
批处理大小(Batch_size)	256
训练次数(Epochs)	50

标签	MHEALTH		PAMAP2
标签	训练集	测试集	训练集	测试集
1	384	96	1 027	341
2	384	96	1 077	234
3	384	96	999	350
4	384	96	1 307	423
5	384	96	496	164
6	360	83	873	305
7	371	89	957	402
8	367	90	488	170
9	384	96	437	105
10	384	96	943	301
11	384	96	1 245	484
12	128	32	306	84
过渡行为	375	93	—	—
总计	4 673	1 155	10 155	3 363

模型	准确率/%	参数量/10⁶	FLOPs/10⁶
PCA+Bi-LSTM^[18]	97.7	0.27	--
CNN+LSTM^[19]	95.1	0.76	25
CNN+Attention+LSTM^[10]	98.1	0.63	--
Multi-CBSA	99.3	0.42	9

模型	准确率/%	参数量/10⁶	FLOPs/10⁶
CNN+Bi-LSTM^[20]	94.2	0.64	23
CNN+GRU^[21]	95.2	0.65	46
Self-Attention^[22]	92.0	--	--
Multi-CBSA	96.4	0.42	30