Journal of System Simulation ›› 2024, Vol. 36 ›› Issue (5): 1061-1071.doi: 10.16182/j.issn1004731x.joss.23-0017
Previous Articles Next Articles
Received:
2023-01-04
Revised:
2023-03-24
Online:
2024-05-15
Published:
2024-05-21
Contact:
Sang Haifeng
E-mail:lixiang3278@163.com;sanghaif@163.com
CLC Number:
Li Xiang, Sang Haifeng. Dense Video Description Method Based on Multi-modal Fusion in Transformer Network[J]. Journal of System Simulation, 2024, 36(5): 1061-1071.
1 | Venugopalan S, Rohrbach M, Donahue J, et al. Sequence to Sequence-video to Text[C]//2015 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2015: 4534-4542. |
2 | Krishna R, Kenji Hata, Ren F, et al. Dense-captioning Events in Videos[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2017: 706-715. |
3 | Duan Xuguang, Huang Wenbing, Gan Chuang, et al. Weakly Supervised Dense Event Captioning in Videos[EB/OL]. (2018-12-10) [2022-07-12]. . |
4 | Jiao Yifan, Li Zhetao, Huang Shucheng, et al. Three-dimensional Attention-based Deep Ranking Model for Video Highlight Detection[J]. IEEE Transactions on Multimedia, 2018, 20(10): 2693-2705. |
5 | Ning Ke, Cai Ming, Xie Di, et al. An Attentive Sequence to Sequence Translator for Localizing Video Clips by Natural Language[J]. IEEE Transactions on Multimedia, 2020, 22(9): 2434-2443. |
6 | Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2017: 6000-6010. |
7 | Yu Zhou, Han Nanjia. Accelerated Masked Transformer for Dense Video Captioning[J]. Neurocomputing, 2021, 445: 72-80. |
8 | Iashin Vladimir, Rahtu Esa. A Better Use of Audio-visual Cues: Dense Video Captioning with Bi-modal Transformer[C]//The 31st British Machine Vision Conference. Durham: BMVC, 2020: 111. |
9 | Chang Zhi, Zhao Dexin, Chen Huilin, et al. Event-centric Multi-modal Fusion Method for Dense Video Captioning[J]. Neural Networks, 2022, 146: 120-129. |
10 | Xu Yuecong, Yang Jianfei, Mao Kezhi. Semantic-filtered Soft-split-aware Video Captioning with Audio-augmented Feature[J]. Neurocomputing, 2019, 357: 24-35. |
11 | Wu Chunlei, Wei Yiwei, Chu Xiaoliang, et al. Hierarchical Attention-based Multimodal Fusion for Video Captioning[J]. Neurocomputing, 2018, 315: 362-370. |
12 | Lee Sujin, Kim Incheol. Learning Semantic Features for Dense Video Captioning[J]. Journal of KIISE, 2019, 46(8): 753-762. |
13 | Wang Teng, Zheng Huicheng, Yu Mingjing, et al. Event-centric Hierarchical Representation for Dense Video Captioning[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(5): 1890-1900. |
14 | Zhang Zhiwang, Xu Dong, Ouyang Wanli, et al. Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(9): 3130-3139. |
15 | Wang Teng, Zhang Ruimao, Lu Zhichao, et al. End-to-end Dense Video Captioning with Parallel Decoding[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, NJ, USA: IEEE, 2021: 6827-6837. |
16 | Banerjee S, Lavie A. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Stroudsburg, PA, USA: ACL, 2005: 65-72. |
17 | Vedantam R, Zitnick C L, Parikh D. CIDEr: Consensus-based Image Description Evaluation[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ, USA: IEEE, 2015: 4566-4575. |
18 | Fujita Soichiro, Hirao Tsutomu, Kamigaito Hidetaka, et al. SODA: Story Oriented Dense Video Captioning Evaluation Framework[C]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 517-531. |
19 | Dai Zihang, Yang Zhilin, Yang Yiming, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-length Context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, 2019: 2978-2988. |
20 | Ryu Hobin, Kang Sunghun, Kang Haeyong, et al. Semantic Grouping Network for Video Captioning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(3), 2514-2522. |
21 | Gabeur Valentin, Sun Chen, Alahari Karteek, et al. Multi-modal Transformer for Video Retrieval[C]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 214-229. |
22 | Lei Jie, Wang Liwei, Shen Yelong, et al. MART: Memory-augmented Recurrent Transformer for Coherent Video Paragraph Captioning[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, 2020: 2603-2614. |
[1] | Bao Weimin, Qi Zhenqiang. Thinking of Aerospace Equipment Systematization Simulation Technology Development [J]. Journal of System Simulation, 2024, 36(6): 1257-1272. |
[2] | Li Qingdong, Ye Jiaquan, Xu Jian. Research on Operational Protection Area of ILS Glide Slope [J]. Journal of System Simulation, 2024, 36(6): 1273-1284. |
[3] | Luo Tianyu, Xing Lining, Wang Rui, Wang Ling, Shi Jianmai, Sun Xin. Dynamic Air Defense Resource Allocation Optimization Based on Improved Differential Evolution Algorithm [J]. Journal of System Simulation, 2024, 36(6): 1285-1297. |
[4] | Deng Mingjun, Hu Xinxia, Li Xiang, Xu Liping. Arterial Coordination Optimization Method Based on Vehicle Speed Guidance and Inductive Control [J]. Journal of System Simulation, 2024, 36(6): 1309-1321. |
[5] | Lu Yang, Liu Pengfei, Xu Siyuan, Liu Qiwang, Gu Fuqian, Wang Peng. Simulation of Rice Disease Recognition Based on Improved Attention Mechanism Embedded in PR-Net Model [J]. Journal of System Simulation, 2024, 36(6): 1322-1333. |
[6] | Huang Lin, Liu Shanjun, Wang Wei, Gong Li. Unsupervised Complex Condition Recognition Based on Stochastic Neighborhood Embedding [J]. Journal of System Simulation, 2024, 36(6): 1334-1343. |
[7] | Liu Shikun, Tang Yi, Liu Yonghong. Application of Driving Simulation Technology in Calibration of Traffic Simulation Parameters [J]. Journal of System Simulation, 2024, 36(6): 1359-1368. |
[8] | Wei Sheng, Wang Yan, Ji Zhicheng. Just-in-time Learning Energy Consumption Predictive Modeling Method in Multi-condition Production Process [J]. Journal of System Simulation, 2024, 36(6): 1378-1391. |
[9] | Wang Fei, Chang Daofang, Wen Furong. Classification Cooperative Scheduling of U-automated Container Terminal Based on Container Markers [J]. Journal of System Simulation, 2024, 36(6): 1392-1403. |
[10] | Jiang Changjian, Fan Hu, Luo Tao, Yuan Wen, He Zehao. Completion Time Simulation Prediction Method for Aircraft Assembly Process with Batch and Sortie [J]. Journal of System Simulation, 2024, 36(6): 1404-1413. |
[11] | Zhu Zilu, Liu Yongkui, Zhang Lin, Wang Lihui, Lin Tingyu. Simulation of Robotic Peg-in-hole Assembly Strategy Based on DRL [J]. Journal of System Simulation, 2024, 36(6): 1414-1424. |
[12] | Su Benyue, Zhu Bangguo, Guo Mengjuan, Sheng Min. Fusing Rotation Angle Coding in Spherical Space for Human Action Recognition [J]. Journal of System Simulation, 2024, 36(6): 1433-1441. |
[13] | Chen Mingzhe, Song Yunzheng, Wang Pei, Zhang Lei. Development and Application of Simulation Platform for Aquatic Movement of an Amphibious Armored Vehicle [J]. Journal of System Simulation, 2024, 36(6): 1442-1451. |
[14] | Zhu Jingyu, Zhang Hongli, Kuang Minchi, Shi Heng, Zhu Jihong, Qiao zhi, Zhou Wenqing. Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards [J]. Journal of System Simulation, 2024, 36(6): 1452-1467. |
[15] | Li Yongbo, Tian Runmei, Zhang Hui, Guo Shanpeng, Li Qi. Design of Real-time Simulation & Test Software Based on Windows/RTX [J]. Journal of System Simulation, 2024, 36(6): 1468-1474. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||