基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真

doi:10.16182/j.issn1004731x.joss.20-0690

系统仿真学报 ›› 2021, Vol. 33 ›› Issue (1): 24-36.doi: 10.16182/j.issn1004731x.joss.20-0690

基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真

庄穆妮^1,3, 李勇^1,2, 谭旭^1,3, 毛太田¹, 蓝凯城³, 邢立宁⁴

1.湘潭大学公共管理学院,湖南湘潭 411105;
2.长沙学院经济与管理学院,湖南长沙 410022;
3.深圳信息职业技术学院软件学院,广东深圳 518172;
4.国防科技大学系统工程学院,湖南长沙 410022

收稿日期:2020-08-31 修回日期:2020-11-04 发布日期:2021-01-18
第一作者简介:庄穆妮(1996-),女,硕士生,研究方向为网络舆情分析。E-mail：997737694@qq.com
基金资助:
国家自然科学基金(72074033),教育部人文社科基金(17YJCZH157),广东省视频图像大数据公共安全应用创新团队项目,深圳市科技计划基础研究重点项目(JCYJ20200109141218676)

Evolutionary Simulation of Online Public Opinion Based on the BERT-LDA Model under COVID-19

Zhuang Muni^1,3, Li Yong^1,2, Tan Xu^1,3, Mao Taitian¹, Lan Kaicheng³, Xing Lining⁴

1. School of Public Management, Xiangtan University, Xiangtan 411105, China;
2. School of Economics and Management, Changsha University, Changsha 410022, China;
3. School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China;
4. College of Systems Engineering, National University of Defense Technology, Changsha 410022, China

Received:2020-08-31 Revised:2020-11-04 Published:2021-01-18

摘要/Abstract

摘要： 构建大规模网络舆情演化仿真模型,对新冠疫情武汉重灾区与全国其他地区采取差异化的应急管理和舆情疏导具有指导价值。为实现主题细粒度的舆情情感演化仿真,将LDA(Latent Dirichlet Allocation)主题模型与BERT(Bidirectional Encoder Representations from Transformers)词向量深度融合,优化主题向量助力文本主题聚类;同时,在改进BERT预训练任务的基础上,叠加深度预训练任务,以提高模型在情感分类中的精确度。结果表明：在主题向量训练过程中,改进的BERT-LDA模型较原始LDA模型NPMI(Normalized Pointwise Mutual Information)值提升0.357;在疫情事件情感分类任务上,AUC(Area Under the Curve)值超过了99.6%,证明其能够有效运用于大规模网络舆情演化仿真。

关键词: 新冠肺炎疫情, BERT-LDA模型, 舆情演化仿真, 差异性比较

Abstract: The construction of a large-scale online public opinion evolution simulation model has guidance value for differentiated emergency management and public opinion guidance in the worst-hit areas in Wuhan and the other areas in China during the outbreak of the COVID-19. In order to realize the fine-grained simulation of the public sentiment evolution of the topic, the LDA topic model is deeply integrated with BERT word vector to optimize the topic vector and power the text topic clustering. At the same time, on the basis of improving BERT pre-training task, the deep pre-training task is superimposed to improve the accuracy of the model in emotion classification. The results show that the NPMI value of the improved BERT-LDA model is 0.357 higher than that of the original LDA model during the topic vector training. In terms of the emotional classification task of epidemic events, the AUC value exceeds 99.6%, which proves that the improved BERT-LDA model can be effectively applied to large-scale internet public opinion evolution simulation.

Key words: corona virus disease 2019 (COVID-19), BERT-LDA model, evolution simulation of public opinion, difference comparison

中图分类号:

TP391.9

庄穆妮,李勇,谭旭等 . 基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真[J]. 系统仿真学报, 2021, 33(1): 24-36.

Zhuang Muni,Li Yong,Tan Xu,et al . Evolutionary Simulation of Online Public Opinion Based on the BERT-LDA Model under COVID-19[J]. Journal of System Simulation, 2021, 33(1): 24-36.

参考文献

[1] 吴世文. 重大突发公共卫生事件中的伪信息传播、治理困境及其突破路径——以新冠肺炎疫情为例[J].电子政务, 2020(9): 40-50.
Wu Shiwen.Novel Coronavirus Pneumonia, the Spread of False Information, and the Way to Solve the Problem: Taking the New Crown Pneumonia Epidemic as an Example[J]. E-Government, 2020(9): 40-50.
[2] 齐佳音, 方滨兴. 重大突发事件中网络舆情引导及治理研究—以新型冠状病毒肺炎疫情为例[J]. 上海对外经贸大学学报, 2020, 27(3): 5-13.
Qi Jiayin, Fang Binxing.Network Public Opinion Response and Governance Innovation in Serious Emergencies: Take the COVID-19 Epidemic as an Example[J]. Journal of Shanghai University of International Business and Economics, 2020, 27(3): 5-13.
[3] 曹武军, 陈秦秀, 薛朝改. 重大疫情网络舆情防控策略研究[J]. 情报杂志, 2020, 39(10): 107-114.
Cao Wujun, Chen Qinxiu, Xue Chaogai.Research on Online Public Opinion Prevention and Control Strategies for Major Epidemic Diseases[J]. Journal of Intelligence, 2020, 39(10): 107-114.
[4] 王文, 王树锋, 李洪华. 基于文本语义和表情倾向的微博情感分析方法[J]. 南京理工大学学报, 2014, 38(6): 733-738,749.
Wang Wen, Wang Shufeng, Li Honghua.Microblogging Sentiment Analysis Method Based on Text Semantics and Expression Tendentiousness[J]. Journal of Nanjing University of Science and Technology, 2014, 38(6): 733-738,749.
[5] Bose R, Dey R K, Roy S, et al.Analyzing Political Sentiment Using Twitter Data[C]. Information and Communication Technology for Intelligent Systems. Singapore: Springer, 2019: 427-436.
[6] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques [J/OL]. EMNLP, 2002, 10: 79-86[2020-05-21]. https://arxiv.org/abs/ cs/0205070.
[7] Zhang H. The Optimality of Naive Bayes [J/OL]. American Association for Artificial Intelligence, 2004. [2020-05-21]. https://www.aaai.org.
[8] Purnamasari N M G D, Fauzi M A, Indriati L S D. Cyberbullying Identification in Twitter Using Support Vector Machine and Information Gain Based Feature Selection[J]. Indonesian Journal of Electrical Engineering and Computer Science (S2502-4752), 2020, 18(3): 1494-1500.
[9] Sherstinsky A.Fundamentals of Recurrent Neural Network (RNN) and Long Short-term Memory (LSTM) Network[J]. Physica D: Nonlinear Phenomena (S0167-2789), 2020, 404: 132306.
[10] Vaswani A, Shazeer N, Parmar N, et al.Attention is All You Need[C]. Advances in Neural Information Processing Systems, 2017: 5998-6008.
[11] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C/OL]. NAACL-HLT (1). 2019 [2020-05-21]. https://arxiv.org/abs/1810.04805.
[12] 杨晨, 宋晓宁, 宋威. SentiBERT: 结合情感信息的预训练语言模型[J]. 计算机科学与探索, 2020, 14(9): 1563-1570.
Yang Chen, Song Xiaoning, Song Wei.SentiBERT: A Pretraining Language Model Combining Sentiment Information[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1563-1570.
[13] Sun C, Qiu X, Xu Y, et al. How to Fine-Tune BERT for Text Classification?[J/OL]. Computation and Language, 2019, 11856: 194-206[2020-05-21]. https://doi.org/10. 1007/978-3-030-32381-3_16.
[14] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research (S1532-4435), 2003, 3: 993-1022.
[15] He K, Zhang X, Ren S, et al.Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Xplore, 2016: 770-778.
[16] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[J]. Learning, 2014 [2020-05-21]. https://arxiv. org/abs/1412.6980.
[17] Wang G, Wong K W, Lu J. AUC-Based Extreme Learning Machines for Supervised and Semi-Supervised Imbalanced Classification[J]. IEEE Transactions on Systems, Man,Cybernetics: Systems (S2168-2216), 2020: 1-12[2020-05-21]. https://ieeexplore.ieee.org/ abstract/document/9063675.
[18] Lalmas M C J V. Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information[M]. Boston: Kluwer Academic Publishers, 1998.
[19] Zhang Z, Sabuncu M. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels[C/OL]. Advances in Neural Information Processing Systems.2018: 8778-8788 [2020-05-21]. https://papers.nips.cc/paper/2018/hash/f2925f97bc13ad2852a7a551802feea0-Abstract.html.
[20] Kumar N, Deepak G, Santhanavijayan A.A Novel Semantic Approach for Intelligent Response Generation using Emotion Detection Incorporating NPMI Measure[J]. Procedia Computer Science (S1877-0509), 2020, 167: 571-579.
[21] 郭业才, 张浩然. 基于改进LDA和自编码器的调制识别算法[J/OL].系统仿真学报: 1-6 [2020-05-21]. http://kns.cnki.net/kcms/detail/11.3092.V.20200102.1527. 008.html.
Guo Yecai, Zhang Haoran. Modulation Recognition Algorithm Based on Improved LDA and Autoencoders [J/OL]. Journal of System Simulation: 1-6 [2020-05-21]. http://kns.cnki.net/kcms/detail/11.3092.V.20200102.1527. 008.html.
[22] Chen X, Xu L, Liu Z, et al.Joint Learning of Character and Word Embeddings[C]. International Conference on Artificial Intelligence. Argentina: AAAI Press, 2015: 1236-1242.
[23] 郭景萍. 社会公共安全视野下的情感安全调控[J]. 湖南师范大学社会科学学报, 2009, 38(2): 87-90.
Guo Jingping.Emotional Security Control on the Perspective of Social Public Safety[J]. Journal of Social Science of Hunan Normal University, 2009, 38(2): 87-90.

基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真

Evolutionary Simulation of Online Public Opinion Based on the BERT-LDA Model under COVID-19

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	董志明, 胡忠奇, 戴浩然, 高建成. 基于大语言模型的作战仿真想定自动化生成方法[J]. 系统仿真学报, 2026, 38(5): 1129-1145.
[2]	李校男, 晁涛, 马萍, 杨明, 王玉轩. 基于期望最大化方法的非线性SSM黑箱鲁棒辨识[J]. 系统仿真学报, 2026, 38(5): 1146-1158.
[3]	刘银钢, 马明, 张荣华. 基于大语言模型的兵棋推演动态任务规划[J]. 系统仿真学报, 2026, 38(5): 1187-1204.
[4]	苏泓嘉, 张成, 刘飞. 基于模糊功能依赖网分析的体系效能评估方法[J]. 系统仿真学报, 2026, 38(5): 1224-1238.
[5]	梅华威, 杨鹏慧, 余洋. 计及数据漂移改进PatchTST的超短期光伏功率预测[J]. 系统仿真学报, 2026, 38(5): 1239-1254.
[6]	李权, 苏鹏, 万海英, 张承玺, 何志坚, 倪艺洋, 赵忠盖, 刘飞. 基于多阶段LHS-EPRCC方法的青霉素发酵过程建模[J]. 系统仿真学报, 2026, 38(5): 1255-1276.
[7]	周子聪, 曾俊杰, 胡越, 朱正秋, 尹全军. 基于次优示例引导的兵棋推演多智能体强化学习方法[J]. 系统仿真学报, 2026, 38(5): 1277-1289.
[8]	石敏, 郭诗盛, 王素琴, 李兆歆, 朱登明. 融合物理与几何先验的无抓取标注6-DoF抓取检测方法[J]. 系统仿真学报, 2026, 38(5): 1290-1302.
[9]	姜彦吉, 肖星佚, 董浩, 于淼, 黄金山, 刘大千, 费博雯. 融合点线特征的图关系优化3D车道线检测方法[J]. 系统仿真学报, 2026, 38(5): 1303-1319.
[10]	张鑫, 张平, 张琛, 刘威, 韩博阳. 非均质土壤条件下挖掘阻力计算模型研究[J]. 系统仿真学报, 2026, 38(5): 1320-1332.
[11]	王伟, 刘东, 崔新豪, 李博, 肖依永, 任羿. 复杂项目多级动态挣值管理数字化模型及应用[J]. 系统仿真学报, 2026, 38(5): 1350-1364.
[12]	彭莉峻, 苏庭琪, 刘沛津, 何林, 周协武, 张闽心. 融合人体关键点的实验室PPE规范穿戴检测方法[J]. 系统仿真学报, 2026, 38(5): 1365-1382.
[13]	滕靖, 童文聪, 张中杰, 姚幸, 李君羡. 有轨电车交叉口速度自动引导方法及仿真评价[J]. 系统仿真学报, 2026, 38(5): 1426-1439.
[14]	蒋圣超, 裴云庆, 翟宏营, 吴国键, 高放. 基于块编码绝热量子牛顿‒拉夫逊法的潮流计算[J]. 系统仿真学报, 2026, 38(5): 1453-1465.
[15]	秦浪, 谢嘉成, 乔晓军, 王学文, 肖智杰. 执行器位姿异常的机器人轨迹规划调整方法[J]. 系统仿真学报, 2026, 38(5): 1466-1483.