基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真

doi:10.16182/j.issn1004731x.joss.20-0690

系统仿真学报 ›› 2021, Vol. 33 ›› Issue (1): 24-36.doi: 10.16182/j.issn1004731x.joss.20-0690

基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真

庄穆妮^1,3, 李勇^1,2, 谭旭^1,3, 毛太田¹, 蓝凯城³, 邢立宁⁴

1.湘潭大学公共管理学院,湖南湘潭 411105;
2.长沙学院经济与管理学院,湖南长沙 410022;
3.深圳信息职业技术学院软件学院,广东深圳 518172;
4.国防科技大学系统工程学院,湖南长沙 410022

收稿日期:2020-08-31 修回日期:2020-11-04 发布日期:2021-01-18
第一作者简介:庄穆妮(1996-),女,硕士生,研究方向为网络舆情分析。E-mail：997737694@qq.com
基金资助:
国家自然科学基金(72074033),教育部人文社科基金(17YJCZH157),广东省视频图像大数据公共安全应用创新团队项目,深圳市科技计划基础研究重点项目(JCYJ20200109141218676)

Evolutionary Simulation of Online Public Opinion Based on the BERT-LDA Model under COVID-19

Zhuang Muni^1,3, Li Yong^1,2, Tan Xu^1,3, Mao Taitian¹, Lan Kaicheng³, Xing Lining⁴

1. School of Public Management, Xiangtan University, Xiangtan 411105, China;
2. School of Economics and Management, Changsha University, Changsha 410022, China;
3. School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen 518172, China;
4. College of Systems Engineering, National University of Defense Technology, Changsha 410022, China

Received:2020-08-31 Revised:2020-11-04 Published:2021-01-18

摘要/Abstract

摘要： 构建大规模网络舆情演化仿真模型,对新冠疫情武汉重灾区与全国其他地区采取差异化的应急管理和舆情疏导具有指导价值。为实现主题细粒度的舆情情感演化仿真,将LDA(Latent Dirichlet Allocation)主题模型与BERT(Bidirectional Encoder Representations from Transformers)词向量深度融合,优化主题向量助力文本主题聚类;同时,在改进BERT预训练任务的基础上,叠加深度预训练任务,以提高模型在情感分类中的精确度。结果表明：在主题向量训练过程中,改进的BERT-LDA模型较原始LDA模型NPMI(Normalized Pointwise Mutual Information)值提升0.357;在疫情事件情感分类任务上,AUC(Area Under the Curve)值超过了99.6%,证明其能够有效运用于大规模网络舆情演化仿真。

关键词: 新冠肺炎疫情, BERT-LDA模型, 舆情演化仿真, 差异性比较

Abstract: The construction of a large-scale online public opinion evolution simulation model has guidance value for differentiated emergency management and public opinion guidance in the worst-hit areas in Wuhan and the other areas in China during the outbreak of the COVID-19. In order to realize the fine-grained simulation of the public sentiment evolution of the topic, the LDA topic model is deeply integrated with BERT word vector to optimize the topic vector and power the text topic clustering. At the same time, on the basis of improving BERT pre-training task, the deep pre-training task is superimposed to improve the accuracy of the model in emotion classification. The results show that the NPMI value of the improved BERT-LDA model is 0.357 higher than that of the original LDA model during the topic vector training. In terms of the emotional classification task of epidemic events, the AUC value exceeds 99.6%, which proves that the improved BERT-LDA model can be effectively applied to large-scale internet public opinion evolution simulation.

Key words: corona virus disease 2019 (COVID-19), BERT-LDA model, evolution simulation of public opinion, difference comparison

中图分类号:

TP391.9

庄穆妮,李勇,谭旭等 . 基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真[J]. 系统仿真学报, 2021, 33(1): 24-36.

Zhuang Muni,Li Yong,Tan Xu,et al . Evolutionary Simulation of Online Public Opinion Based on the BERT-LDA Model under COVID-19[J]. Journal of System Simulation, 2021, 33(1): 24-36.

参考文献

[1] 吴世文. 重大突发公共卫生事件中的伪信息传播、治理困境及其突破路径——以新冠肺炎疫情为例[J].电子政务, 2020(9): 40-50.
Wu Shiwen.Novel Coronavirus Pneumonia, the Spread of False Information, and the Way to Solve the Problem: Taking the New Crown Pneumonia Epidemic as an Example[J]. E-Government, 2020(9): 40-50.
[2] 齐佳音, 方滨兴. 重大突发事件中网络舆情引导及治理研究—以新型冠状病毒肺炎疫情为例[J]. 上海对外经贸大学学报, 2020, 27(3): 5-13.
Qi Jiayin, Fang Binxing.Network Public Opinion Response and Governance Innovation in Serious Emergencies: Take the COVID-19 Epidemic as an Example[J]. Journal of Shanghai University of International Business and Economics, 2020, 27(3): 5-13.
[3] 曹武军, 陈秦秀, 薛朝改. 重大疫情网络舆情防控策略研究[J]. 情报杂志, 2020, 39(10): 107-114.
Cao Wujun, Chen Qinxiu, Xue Chaogai.Research on Online Public Opinion Prevention and Control Strategies for Major Epidemic Diseases[J]. Journal of Intelligence, 2020, 39(10): 107-114.
[4] 王文, 王树锋, 李洪华. 基于文本语义和表情倾向的微博情感分析方法[J]. 南京理工大学学报, 2014, 38(6): 733-738,749.
Wang Wen, Wang Shufeng, Li Honghua.Microblogging Sentiment Analysis Method Based on Text Semantics and Expression Tendentiousness[J]. Journal of Nanjing University of Science and Technology, 2014, 38(6): 733-738,749.
[5] Bose R, Dey R K, Roy S, et al.Analyzing Political Sentiment Using Twitter Data[C]. Information and Communication Technology for Intelligent Systems. Singapore: Springer, 2019: 427-436.
[6] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques [J/OL]. EMNLP, 2002, 10: 79-86[2020-05-21]. https://arxiv.org/abs/ cs/0205070.
[7] Zhang H. The Optimality of Naive Bayes [J/OL]. American Association for Artificial Intelligence, 2004. [2020-05-21]. https://www.aaai.org.
[8] Purnamasari N M G D, Fauzi M A, Indriati L S D. Cyberbullying Identification in Twitter Using Support Vector Machine and Information Gain Based Feature Selection[J]. Indonesian Journal of Electrical Engineering and Computer Science (S2502-4752), 2020, 18(3): 1494-1500.
[9] Sherstinsky A.Fundamentals of Recurrent Neural Network (RNN) and Long Short-term Memory (LSTM) Network[J]. Physica D: Nonlinear Phenomena (S0167-2789), 2020, 404: 132306.
[10] Vaswani A, Shazeer N, Parmar N, et al.Attention is All You Need[C]. Advances in Neural Information Processing Systems, 2017: 5998-6008.
[11] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C/OL]. NAACL-HLT (1). 2019 [2020-05-21]. https://arxiv.org/abs/1810.04805.
[12] 杨晨, 宋晓宁, 宋威. SentiBERT: 结合情感信息的预训练语言模型[J]. 计算机科学与探索, 2020, 14(9): 1563-1570.
Yang Chen, Song Xiaoning, Song Wei.SentiBERT: A Pretraining Language Model Combining Sentiment Information[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1563-1570.
[13] Sun C, Qiu X, Xu Y, et al. How to Fine-Tune BERT for Text Classification?[J/OL]. Computation and Language, 2019, 11856: 194-206[2020-05-21]. https://doi.org/10. 1007/978-3-030-32381-3_16.
[14] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research (S1532-4435), 2003, 3: 993-1022.
[15] He K, Zhang X, Ren S, et al.Deep Residual Learning for Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE Xplore, 2016: 770-778.
[16] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[J]. Learning, 2014 [2020-05-21]. https://arxiv. org/abs/1412.6980.
[17] Wang G, Wong K W, Lu J. AUC-Based Extreme Learning Machines for Supervised and Semi-Supervised Imbalanced Classification[J]. IEEE Transactions on Systems, Man,Cybernetics: Systems (S2168-2216), 2020: 1-12[2020-05-21]. https://ieeexplore.ieee.org/ abstract/document/9063675.
[18] Lalmas M C J V. Information Retrieval: Uncertainty and Logics: Advanced Models for the Representation and Retrieval of Information[M]. Boston: Kluwer Academic Publishers, 1998.
[19] Zhang Z, Sabuncu M. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels[C/OL]. Advances in Neural Information Processing Systems.2018: 8778-8788 [2020-05-21]. https://papers.nips.cc/paper/2018/hash/f2925f97bc13ad2852a7a551802feea0-Abstract.html.
[20] Kumar N, Deepak G, Santhanavijayan A.A Novel Semantic Approach for Intelligent Response Generation using Emotion Detection Incorporating NPMI Measure[J]. Procedia Computer Science (S1877-0509), 2020, 167: 571-579.
[21] 郭业才, 张浩然. 基于改进LDA和自编码器的调制识别算法[J/OL].系统仿真学报: 1-6 [2020-05-21]. http://kns.cnki.net/kcms/detail/11.3092.V.20200102.1527. 008.html.
Guo Yecai, Zhang Haoran. Modulation Recognition Algorithm Based on Improved LDA and Autoencoders [J/OL]. Journal of System Simulation: 1-6 [2020-05-21]. http://kns.cnki.net/kcms/detail/11.3092.V.20200102.1527. 008.html.
[22] Chen X, Xu L, Liu Z, et al.Joint Learning of Character and Word Embeddings[C]. International Conference on Artificial Intelligence. Argentina: AAAI Press, 2015: 1236-1242.
[23] 郭景萍. 社会公共安全视野下的情感安全调控[J]. 湖南师范大学社会科学学报, 2009, 38(2): 87-90.
Guo Jingping.Emotional Security Control on the Perspective of Social Public Safety[J]. Journal of Social Science of Hunan Normal University, 2009, 38(2): 87-90.

基于BERT-LDA模型的新冠肺炎疫情网络舆情演化仿真

Evolutionary Simulation of Online Public Opinion Based on the BERT-LDA Model under COVID-19

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	黄涛, 张智, 丁玉杰, 陈艳波, 王晶, 张文倩. 考虑动态频率安全与N-k故障的鲁棒应急调度方法[J]. 系统仿真学报, 2025, 37(12): 2981-2993.
[2]	张润昭, 陈艳波, 黄涛, 田昊欣, 强涂奔, 张智. 基于异构负荷特征解析预测的虚拟电厂调度方法[J]. 系统仿真学报, 2025, 37(12): 2994-3006.
[3]	于祥星, 赵艳东, 张宝琳. 基于电涡流NES的海上风机塔架振动控制[J]. 系统仿真学报, 2025, 37(12): 3007-3017.
[4]	李斌, 王于绰. 基于多策略融合的光伏系统故障诊断方法[J]. 系统仿真学报, 2025, 37(12): 3018-3032.
[5]	李孝斌, 胡冰, 尹超, 李波, 马军. 基于时空图卷积的汽车配件供应链需求预测与仿真分析[J]. 系统仿真学报, 2025, 37(12): 3060-3074.
[6]	彭艺, 雷云揆, 杨青青, 李辉, 王健明. 改进PID搜索算法的山地环境无人机路径规划[J]. 系统仿真学报, 2025, 37(12): 3075-3086.
[7]	陈逸, 邱思航, 朱正秋, 季雅泰, 赵勇, 鞠儒生. 基于启发式的人-大模型协作寻源方法[J]. 系统仿真学报, 2025, 37(12): 3112-3127.
[8]	索婧怡, 卢柏宏, 屈澈. 影视LED光源光强分布测定及其在游戏引擎中的仿真研究[J]. 系统仿真学报, 2025, 37(12): 3140-3151.
[9]	龚建兴, 胡海, 任海慧, 吴瑞祥. 面向虚实结合的军事训练系统互操作模型与运用[J]. 系统仿真学报, 2025, 37(12): 3161-3175.
[10]	徐智霞, 王蕊, 孙楠, 何兵, 沈晓卫, 朱晓菲. 基于改进遗传算法的协同干扰资源分配问题研究[J]. 系统仿真学报, 2025, 37(12): 3176-3189.
[11]	刘翔, 金乾坤. 基于PAC-Bayes的多目标强化学习A2C算法研究[J]. 系统仿真学报, 2025, 37(12): 3212-3223.
[12]	杨兰英, 李超, 邹海锋, 万江涛, 张仁强, 刘惠, 卢宏. 基于改进蚁群算法与A*算法相融合的机器人路径规划优化[J]. 系统仿真学报, 2025, 37(11): 2956-2965.
[13]	苏筱婷, 张小威, 田义, 李奇, 王帅豪. 星光导航动态仿真场景时序设计方法研究[J]. 系统仿真学报, 2025, 37(11): 2946-2955.
[14]	张志利, 刘瑾, 周召发, 梁哲, 张云昊. 基于ISCSO-BP神经网络模型的光纤陀螺温度补偿技术研究[J]. 系统仿真学报, 2025, 37(11): 2904-2917.
[15]	陈际同, 周佳加, 吴迪, 江海龙. 基于TD3-RRT的特殊环境下USV路径规划算法研究[J]. 系统仿真学报, 2025, 37(11): 2888-2903.