系统仿真学报 ›› 2026, Vol. 38 ›› Issue (6): 1749-1760.doi: 10.16182/j.issn1004731x.joss.25-1237

• 论文 • 上一篇    

面向视唱认知仿真的眼动跟踪优化建模方法研究

张堃1, 钱佳杰1, 马树红2, 赵增旭2, 潘钰晨1, 唐曜祺3   

  1. 1.南通大学 电气与自动化学院,江苏 南通 226019
    2.中国电子学会,北京 100036
    3.南通大学 艺术学院,江苏 南通 226019
  • 收稿日期:2025-12-17 修回日期:2026-03-09 出版日期:2026-06-25 发布日期:2026-06-25
  • 通讯作者: 唐曜祺
  • 第一作者简介:张堃(1983-),男,教授,博士,研究方向为人工智能图像处理。

Research on Optimization Modeling Method for Eye Tracking in Solfeggio Cognitive Simulation

Zhang Kun1, Qian Jiajie1, Ma Shuhong2, Zhao Zengxu2, Pan Yuchen1, Tang Yaoqi3   

  1. 1.School of Electrical and Automation, Nantong University, Nantong 226019, China
    2.Chinese Institute of Electronics, Beijing 100036, China
    3.School of Art, Nantong University, Nantong 226019, China
  • Received:2025-12-17 Revised:2026-03-09 Online:2026-06-25 Published:2026-06-25
  • Contact: Tang Yaoqi

摘要:

针对乐谱视唱教学仿真中头部运动引发的注视点偏移问题及现有方法缺乏系统级仿真验证,提出一种融合图像语义理解、时序轨迹建模与视唱认知仿真的注视精度优化 方法以Vision Transformer为核心经Mahalanobis距离、滑动窗口与兴趣区域预处理后,引入位置偏移感知、偏移残差回归与双通路融合,实现无标注条件下的偏移建模与校正。仿真结果表明:该方法误差较原始值误差降低43.9%;移除任一模块平均欧几里得距离明显上升,最大增幅为48.6%;跨数据集实验中,不同数据集校正率保持在40%左右;不同任务场景中平均降低偏移误差36.6%~40.9%。提升了眼动数据可靠性,为视唱认知评估与人机交互仿真系统提供技术支持。

关键词: 眼动跟踪, 乐谱视唱, 头部运动偏移, 注视精度优化, Vision Transformer模型

Abstract:

To address the fixation offset problem caused by head movement in music solfeggio teaching simulation and the lack of system-level simulation validation in existing methods, this paper proposed a fixation accuracy optimization method integrating image semantic understanding, temporal trajectory modeling, and solfeggio cognitive simulation. With Vision Transformer as the core, after preprocessing via Mahalanobis distance, sliding window, and region of interest, position offset perception, offset residual regression, and dual-pathway fusion were introduced to achieve offset modeling and correction under unlabeled conditions. Simulation results indicate that the error of this method decreases by 43.9% compared with the original value error; removing any module significantly increases the average Euclidean distance, with a maximum increase of 48.6%; in cross-dataset experiments, the correction rates across different datasets remain at around 40%; the offset error is reduced by 36.6%~40.9% on average in different task scenarios. This method improves the reliability of eye tracking data and provides technical support for solfeggio cognitive assessment and human-computer interaction simulation systems.

Key words: eye tracking, music solfeggio, head movement offset, fixation accuracy optimization, Vision Transformer model

中图分类号: