大数据环境下历史人物知识图谱构建与实现

系统仿真学报 ›› 2016, Vol. 28 ›› Issue (10): 2560-2566.

大数据环境下历史人物知识图谱构建与实现

周亦^1,2, 周明全^1,2, 王学松^1,2, 黄友良^1,2

1.北京师范大学信息科学与技术学院,北京 100875;
2.教育部虚拟现实应用工程中心,北京 100875

收稿日期:2016-04-27 修回日期:2016-07-14 出版日期:2016-10-08 发布日期:2020-08-13
第一作者简介:周亦(1993-),女,湖北,硕士,研究方向为虚拟现实与可视化;周明全(1954-),男,陕西,博导,研究方向为虚拟现实与可视化;王学松(1975-),男,陕西,博士,研究方向为虚拟现实与可视化。

Design and Implementation of Historical Figures Knowledge Graph Visualization System

Zhou Yi^1,2, Zhou Mingquan^1,2, Wang Xuesong^1,2, Huang Youliang^1,2

1. Department of Information Technology, Beijing Normal University, Beijing 100875, China;
2. Engineering Research Center for Virtual Reality Applications, MOE, Beijing 100875, China

Received:2016-04-27 Revised:2016-07-14 Online:2016-10-08 Published:2020-08-13

摘要/Abstract

摘要： 大数据时代下,知识图谱和数据可视化技术能够将数据以结构化、可视化的方式呈现,建立以关键词为中心的知识体系,展示数据间相互关系。在此基础上,设计并实现历史人物实体关系可视化系统。系统基于Nodejs平台,采用B/S架构,将繁杂数据分为人物数据和事件数据,分别采用基于标签遍历和基于链接权重的方法进行数据解析,存储至历史人物库。系统提供多种交互方式并具有良好的扩展性和维护性,以丰富直观地形式将历史人物和事件的信息可视化,帮助人们更好理解、梳理及挖掘历史人物及相关事件关系,对相关研究人员有一定的帮助和参考价值。

关键词: 知识图谱, 实体关系, 数据解析, 数据可视化

Abstract: With the advent of big data era, knowledge graph and data visualization technology present the data in a structured, visual way and establish a keyword-oriented knowledge system and render the relationship in a fast and clearly way. In this paper, a historical figures entity relationship visualization system has been established by means of data visualization and knowledge graph. In the system,the complex data are divided into character data and event data by data preprocessing. In the parsing stage, a label traversing method and a method based on weight of links are applied to the divided data respectively. With the layered B/S structure design, the system is based on the Nodejs platform in which a historical figures database is founded. The users can obtain the knowledge graphs of relevant historical figures and events according to distinct needs. This system provides a variety of interactive with good scalability and maintainability and makes contributions to comprehension and exploration of the data and relationship quickly by presenting the data in visual forms. To some extent, it owes reference value to research staff as well.

Key words: knowledge graph, entity relationship, data parse, data visualization

中图分类号:

TP391.9

周亦,周明全,王学松等 . 大数据环境下历史人物知识图谱构建与实现[J]. 系统仿真学报, 2016, 28(10): 2560-2566.

Zhou Yi,Zhou Mingquan,Wang Xuesong,et al . Design and Implementation of Historical Figures Knowledge Graph Visualization System[J]. Journal of System Simulation, 2016, 28(10): 2560-2566.

参考文献 19

[1]	孟小峰, 慈祥. 大数据管理: 概念、技术与挑战[J]. 计算机研究与发展, 2013, 50(1): 146-169.
[2]	刘则渊, 陈悦, 候海燕, 等. 科学知识图谱:方法与应用 [M]. 北京: 人民出版社, 2008: 19-20.
[3]	杨思洛, 韩瑞珍. 知识图谱研究现状及趋势的可视化分析[J]. 情报资料工作, 2012, 3(4): 22-28.
[4]	Herl HE, Jr HFO, Chung GKWK, et al.Reliability and validity of a computer-based knowledge mapping system to measure content understanding[J]. Computers in Human Behavior (S0747-5632),1999, 15(3/4): 315-333.
[5]	Keim D A.Information Visualization and Visual Data Mining[J]. IEEE Transactions on Visualization (S1077-2626),2002, 8(1): 1-8.
[6]	张卓, 宣蕾, 郝树勇. 可视化技术研究与比较[J]. 现代电子技术, 2010, 33(17): 133-138.
[7]	周德懋, 李舟军. 高性能网络爬虫: 研究综述[J]. 计算机科学, 2009, 36(8): 26-29.
[8]	肖毅, 张林, 聂笑一. 基于WEB挖掘的网络爬虫设计与实现[J]. 计算机系统应用, 2013(9): 60-63.
[9]	范珊珊, 李石君. 基于优先级队列的分布式多主题爬虫[J]. 计算机工程与设计, 2015, 6(6): 1630-1636.
[10]	Gupta S, Kaiser G E, Grimm P, et al.Automating Content Extraction of HTML Documents[J]. World Wide Web-internet & Web Information Systems (S1386-145X), 2005, 8(2): 179-224.
[11]	Mane T B, Potdar G P.Template Extraction from Heterogeneous Web Pages[J]. International Journal of Advanced Computer Research (S2249-7277), 2012, 2(6): 2278-0181.
[12]	常育红, 姜哲, 朱小燕. 基于标记树表示方法的页面结构分析[J]. 计算机工程与应用, 2004, 40(16): 129-132.
[13]	Cattell R.Scalable SQL and NoSQL data stores[J]. Acm Sigmod Record (S0163-5808), 2011, 39(4): 12-27.
[14]	王利, 刘宗田, 王燕华, 等. 基于内容相似度的网页正文提取[J]. 计算机工程, 2010, 36(6): 102-104.
[15]	殷彬, 杨会志. 灵活结构网页的正文提取[J]. 计算机技术与发展, 2011, 21(9): 111-113.
[16]	张小欢. 中文分词系统的设计和实现 [D]. 成都: 电子科技大学, 2010.
[17]	姜维. 统计中文词法分析及其强化学习机制的研究 [D]. 哈尔滨: 哈尔滨工业大学, 2007.
[18]	胡金栋. 网页正文提取及去重技术研究 [D]. 杭州: 浙江大学, 2011.
[19]	梁正友, 欧杰, 俞闽敏. 基于图文有效信息量的网页正文定位[J]. 计算机工程, 2011, 37(23): 276-278.