基于空间线索时域梯度的音频关注度计算模型

系统仿真学报 ›› 2016, Vol. 28 ›› Issue (10): 2369-2377.

基于空间线索时域梯度的音频关注度计算模型

杭波, 王毅, 康长青, 黄健

湖北文理学院数学与计算机科学学院,湖北襄阳 441053

收稿日期:2016-05-31 修回日期:2016-07-14 出版日期:2016-10-08 发布日期:2020-08-13
第一作者简介:杭波(1978-),男,湖北,博士,副教授,研究方向为多媒体信息处理及压缩;王毅(1980-),男,湖北,硕士,副教授,研究方向为数字媒体技术。
基金资助:
国家自然科学基金(61201247),湖北省自然科学基金(2011CDB322)

Spatial Cues Gradient in Time Domain Based Audio Attention Computational Model

Hang Bo, Wang Yi, Kang Changqing, Huang Jian

School of Mathematics and Computer Science, Hubei University of Arts and Science, Xiangyang 441053, China

Received:2016-05-31 Revised:2016-07-14 Online:2016-10-08 Published:2020-08-13

摘要/Abstract

摘要： 虚拟现实中方位快速变化的音频信号应当具有较高的关注度,但现有自底向上音频关注度计算模型提取底层音频特征如能量、基音、过零率等,无法有效表达该类信号引起的音频关注度,有可能造成漏检。针对此问题,基于空间信息对关注产生影响的心理学原理,引入空间线索短时变化梯度,用以度量单声源空间方位快速变化引起的关注。计算由各子带空间线索组成的空间线索矢量的短时变化均值作为空间线索变化梯度,建立基于空间线索变化梯度的音频关注度模型。与当前音频关注度计算模型相比,关注音频的检出率提高了4.5个百分点。

关键词: 音频, 关注度计算模型, 空间线索, 梯度

Abstract: In virtual reality audio, sound source whose directions change rapidly should have higher attention level. But present bottom-up audio attention computational models extract the underlying characteristics of single channel audio such as energy, pitch, zero crossing rate etc., which can not effectively express the audio attention caused by such signals. To solve this problem, based on the psychological principles that spatial information affects attention, a model was proposed to introduce the short-term spatial gradient cues to measure the attention caused by the single audio source space direction changing. Compared to the traditional audio attention computational model, the recall of detection of attention audio events increased 4.5 percentage points in experiments.

Key words: audio, attention computational model, spatial cues, gradient

中图分类号:

TP391.9

杭波,王毅,康长青等 . 基于空间线索时域梯度的音频关注度计算模型[J]. 系统仿真学报, 2016, 28(10): 2369-2377.

Hang Bo,Wang Yi,Kang Changqing,et al . Spatial Cues Gradient in Time Domain Based Audio Attention Computational Model[J]. Journal of System Simulation, 2016, 28(10): 2369-2377.

参考文献 26

[1]	Chen Y, Song M, Xue L, et al.An Audio-Visual Human Attention Analysis Approach to Abrupt Change Detection in Videos[J]. Signal Processing (S0165-1684), 2015, 110: 143-154.
[2]	Alho K, Salmi J, Koistinen S, et al.Top-down Controlled and Bottom-up Triggered Orienting of Auditory Attention to Pitch Activate Overlapping Brain Networks[J]. Brain research (S0006-8993), 2015, 1626: 136-145.
[3]	Frintrop S, Backer G, Rome E.Goal-directed Search with a Top-down Modulated Computational Attention System[J]. Pattern Recognition (S0031-3203) 2005, 3663: 117-124.
[4]	Liu Y, Bengson J, Huang H, et al.Top-down Modulation of Neural Activity in Anticipatory Visual Attention: Control Mechanisms Revealed by Simultaneous EEG-fMRI[J]. Cerebral Cortex (S1047-3211), 2016, 26(2): 517-529.
[5]	D Gao, N Vasconcelos.Bottom-up Saliency is a Discriminant Process[C]// Proc. ICCV, 2007, 12: 1-6.
[6]	C E Connor, H E Egeth S Yantis. Visual attention: Bottom-up Versus Top-down[J], Current Biol.(S0960-9822), 2004, 14(19): 850-852.
[7]	C Kayser, C Petkov, M Lippert, et al.Mechanisms for Allocating Auditory Attention: An Auditory Saliency Map[J]. Current Biology (S0960-9822), 2005, 15(8): 1943-1947.
[8]	Ozlem Kalinli, Shrikanth Narayanan.A Saliency-based Auditory Attention Model with Applications to Unsupervised Prominent Syllable Detection in Speech[C]// Proc. Interspeech, Antwerp, Belgium, Aug. 2007. France: International Speech and Communication Association, 2007: 1941-1944.
[9]	G Evangelopoulos, K Rapantzikos, P Maragos, et al.Potamianos, Audiovisual Attention Modeling and Salient Event Detection, Multimodal Processing and Interaction: Audio, Video, Text, P Maragos, A Potamianos, P Gross, Eds[M]. Germany: Springer, 2008: 178-200.
[10]	G Evangelopoulos, K Rapantsikos, A Potamianos, et al.Movie Summarization Based on Audiovisual Saliency Detection[C]// ICIP 2008. U.S.A: IEEE, 2008: 2528-2531.
[11]	G Evangelopoulos, A Zlatintsi, G Skoumas, et al.Video Event Detection and Summarization Using Audio, Visual and Text Saliency[C]// ICASSP 2009. U.S.A: IEEE, 2009: 3533-3536.
[12]	Yijia Zheng, Guangyu Zhu, Shuqiang Jiang, et al.Visual-aural Attention Modeling for Talk Show Video Highlight Detection[C]// ICASSP 2008. U.S.A: IEEE, 2008: 2213-2216.
[13]	Ozlem Kalinli, Shrikanth Narayanan.Combining Task-Dependent Information with Auditory Attention Cues for Prominence Detection in Speech[C]// Proc. Interspeech, Brisbane, Australia, Sep. 2008. France: International Speech and Communication Association, 2008: 1064-1067.
[14]	Rui Cai, Lie Lu, Hong-Jiang Zhung, et al.Highlight Sound Effects Detection in Audio Stream[C]// ICME 2003. U.S.A: IEEE, 2003: 37-40.
[15]	Yu-Fei Ma, Xian-Sheng Hua, Lie Lu, et al.A Generic Framework of User Attention Model and Its Application in Video Summarization[J]. IEEE Transaction on Multimedia (S1520-9210), 2005, 7(5): 907-919.
[16]	Anan Liu, Jintao Li, Yongdong Zhang, et al.Human Attention Model for Action Movie Analysis. Pervasive Computing and Applications[C]// ICPCA 2007. U.S.A: IEEE, 2007: 204-208.
[17]	Ozlem Kalinli, Shrikanth Narayanan.A Top-down Auditory Attention Model for Learning Task Dependent Influences on Prominence Detection in Speech[C]// Proc. ICASSP, Las Vegas, NV, Apr. 2008. U.S.A: IEEE, 2008: 3981-3984.
[18]	Ozlem Kalinli, Shrikanth Narayanan.Prominence Detection Using Auditory Attention Cues and Task-dependent High Level Information[J]. IEEE Transactions on Audio, Speech, and Language Processing (S1558-7916), 2009, 17(5): 1009-1024.
[19]	Ozlem Kalinli.Tone and Pitch Accent Classification Using Auditory Attention Cues[C]// Proceedings of International Conference on Acoustics, Speech, and Signal Processing. U.S.A: IEEE, 2011: 5208-5211.
[20]	Eramudugolla R, McAnally K I, Martin R L, et al. The role of spatial location in auditory search[J]. Hearing Research (S0378-5955), 2008, 238(1): 139-146.
[21]	David V Smith, Ben Davis, Kathy Niu, et al.Spatial Attention Evokes Similar Activation Patterns for Visual and Auditory Stimuli[J]. Journal of Cognitive Neuroscience (S0898-929X), 2010, 22(2): 347-361.
[22]	Seungkwon Beack, Jeongil Seo, Taejin Lee, et al.Spatial Cue Based Sound Scene Control for MPEG Surround[C]// IEEE International Conference on Multimedia & Expo, 2007. USA: IEEE, 2007: 1886-1889.
[23]	Jeongook Song, Hyen-O Oh, Hong-Goo Kong.Enhanced Long-term Predictor for Unified Speech and Audio Coding[C]// IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). USA: IEEE, 2011: 505-508.
[24]	E Zwicker.Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen)[J]. Journal of the Acoustical Society of America (S0001-4966), 1961, 33(2): 248-248.
[25]	Brian C J Moore, Brian R Glasberg. Suggested Formulae for Calculating Auditory‐Filter Bandwidths and Excitation Patterns[J]. Journal of the Acoustical Society of America (S0001-4966), 1983, 74(3): 750-753.
[26]	刘晓晖, 万丽莉, 季红, 等. 基于中文语音交互的虚拟装配技术研究[J]. 系统仿真学报, 2014, 26(9): 2056-2061.