系统仿真学报 ›› 2018, Vol. 30 ›› Issue (9): 3293-3305.doi: 10.16182/j.issn1004731x.joss.201809009

• 仿真建模理论与方法 • 上一篇    下一篇

构建自动演进的天文大数据负载模型

王华进1,2, 万萌3, 韩锐4, 任玮5, 张海明1, 黎建辉1,*   

  1. 1. 中国科学院计算机网络信息中心, 北京 100190;
    2. 中国科学院大学, 北京 100049;
    3. 中国科学院国家天文台, 北京 100012;
    4. 中国科学院计算技术研究所, 北京 100190;
    5. 中国人民大学, 北京 100872
  • 收稿日期:2016-12-08 出版日期:2018-09-10 发布日期:2019-01-08
  • 作者简介:王华进(1987-),男,山东茌平,博士生,研究方向为大规模数据管理、负载仿真;万萌(1983-),女,山东青岛,博士,工程师,研究方向为天文数据库系统。
  • 基金资助:
    国家重点研发计划(2016YFB1000600,2016YFB0501900)

Towards the Automatic Evolution of Workload Models in Large-scale Astronomical Data Management

Wang Huajin1,2, Wan Meng3, Han Rui4, Ren Wei5, Zhang Haiming1, Li Jianhui1,*   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012, China;
    4. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    5. Renmin University of China, Beijing 100872, China
  • Received:2016-12-08 Online:2018-09-10 Published:2019-01-08

摘要: 基准测试在数据管理系统的选型和优化中发挥指导作用的前提是其采用的负载模型能够:运行在目标场景中的各类系统上(移植性);反映目标场景中典型任务的特点和数据访问偏好(代表性)。当前天文大数据管理领域的新系统和新任务层出不穷,导致现有方法构建的负载模型容易失去移植性和代表性。提出了自动演进的负载建模方法:采用抽象操作保持对新型系统的移植性,通过分析负载日志保持对新型任务的代表性。通过一个系统优化案例展示了该方法的可行性。

关键词: 负载模型, 基准测试, 天文数据管理, 查询优化

Abstract: The benchmark's guiding role in system selection/optimization requires its workload model has the ability to: Run on various systems of the target application scenario (be portable); Reflect the typical tasks' characteristics and data access patterns (be representative). The emerging systems and tasks in large-scale astronomical data management field have led workload models constructed by existing methods to be prone to lose portability and representativeness. An automatic evolutionary workload modeling method has been proposed: Abstract operations are used to keep the workload model’s portability; Automatic workload log analytics are used to keep the workload model’s representativeness. The feasibility of this method is verified by a cluster optimization case.

Key words: workload model, benchmarking, astronomical data management, query optimization

中图分类号: