系统仿真学报 ›› 2016, Vol. 28 ›› Issue (3): 549-558.

• 仿真系统与技术 • 上一篇    下一篇

面向数据密集型工作流的QoS-aware调度算法

万聪, 王翠荣, 王聪   

  1. 东北大学 信息科学与工程学院,辽宁 沈阳 110004
  • 收稿日期:2014-07-15 修回日期:2014-10-23 发布日期:2020-07-02
  • 作者简介:万聪(1983-),男,河北秦皇岛,讲师,博士生,研究方向为并行计算、云计算中调度问题;王翠荣(1963-),女,河北迁安,教授,博士,研究方向为大数据处理、网络虚拟化。
  • 基金资助:
    国家自然科学基金(61300195)

QoS-aware Scheduling for Data Intensive Workflow

Wan Cong, Wang Cuirong, Wang Cong   

  1. College of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2014-07-15 Revised:2014-10-23 Published:2020-07-02

摘要: 随着技术的发展,人们可以从不同的数据中心获得资源。跨数据中心的数据密集型工作流任务调度成为了一个热点问题。提出了一个在多数据中心环境下的数据密集型工作流任务调度算法。多数据中心环境中的数据密集型工作流计算有2个特点:(1)数据量大,而且分布在不同的地理位置,数据迁移的过程会消耗大量的时间和带宽;(2)数据中心具有异构性,提供资源的数量、种类和价格不同。针对这些特点,算法将数据迁移的过程映射为一个数据迁移任务,使用有向无环图(DAG)对工作流进行建模,对DAG进行了化简。算法利用模拟退火算法,将工作流执行时间和花费作为优化目标,计算出一个优化的调度方案。分别使用CloudSim平台和Hadoop平台对算法进行了验证。

关键词: 工作流, 云计算, 大数据, 调度

Abstract: The development of technology enables people to access resources from different data centers. Resource management and scheduling of applications, such as workflow, that are deployed on the cloud computing environment have already become a hot spot. A QoS-aware scheduling algorithm for data intensive workflow on multiple data center environment was proposed. Scheduling data intensive workflow on multiple data center environment has two characteristics: A large amount of data is distributed in different geographical locations, the process of data migration will consume a large amount of time and bandwidth; secondly, the data centers have different price and resources. Data migration between data centers was mapped to a task of workflow, models workflow with DAG, simplifying the DAG, and scheduling all the tasks of workflow using Simulated Annealing. The experiment using CloudSim platform and Hadoop platform shows that this scheduling algorithm is effective.

Key words: workflow, big data, cloud computing, schedule

中图分类号: